Статья

A Hyper-Attentive Multimodal Transformer for Real-Time and Robust Facial Expression Recognition

Zarnigor TagmatovaDepartment of Computer Engineering, Gachon University, Sujeong-Gu, Seongnam-si 13120, Gyeonggi-Do, Republic of KoreaSabina UmirzakovaDepartment of Computer Engineering, Gachon University, Sujeong-Gu, Seongnam-si 13120, Gyeonggi-Do, Republic of KoreaAlpamis KutlimuratovDepartment of Applied Informatics, Kimyo International University in Tashkent, Tashkent 100121, UzbekistanAkmalbek AbdusalomovDepartment of Artificial Intelligence, Tashkent State University of Economics, Tashkent 100066, UzbekistanYoung Im ChoDepartment of Computer Engineering, Gachon University, Sujeong-Gu, Seongnam-si 13120, Gyeonggi-Do, Republic of Korea

Applied Sciencesjournal2025en

ABI

Аннотация

Facial expression recognition (FER) plays a critical role in affective computing, enabling machines to interpret human emotions through facial cues. While recent deep learning models have achieved progress, many still fail under real-world conditions such as occlusion, lighting variation, and subtle expressions. In this work, we propose FERONet, a novel hyper-attentive multimodal transformer architecture tailored for robust and real-time FER. FERONet integrates a triple-attention mechanism (spatial, channel, and cross-patch), a hierarchical transformer with token merging for computational efficiency, and a temporal cross-attention decoder to model emotional dynamics in video sequences. The model fuses RGB, optical flow, and depth/landmark inputs, enhancing resilience to environmental variation. Experimental evaluations across five standard FER datasets—FER-2013, RAF-DB, CK+, BU-3DFE, and AFEW—show that FERONet achieves superior recognition accuracy (up to 97.3%) and real-time inference speeds (<16 ms per frame), outperforming prior state-of-the-art models. The results confirm the model’s suitability for deployment in applications such as intelligent tutoring, driver monitoring, and clinical emotion assessment.

Темы

Emotion and Mood Recognition Face and Expression Recognition Face recognition and analysis

Идентификаторы

DOI: 10.3390/app15137100

Цитирования и источники

Цитирований: 6 Использованных источников: 40

Показатели — AkademScholar · Скоро