Article

Analytical Modeling of Hybrid CNN-Transformer Dynamics for Emotion Classification

Ergashevich Halimjon KhujamatovDepartment of Computer Engineering, Gachon University, Seognam-daero, Sujeong-gu, Seongnam-si 1342, Republic of KoreaMirjamol AbdullaevDepartment of Information Systems and Technologies, Tashkent State University of Economics, Tashkent 100066, UzbekistanSabina UmirzakovaDepartment of Computer Engineering, Gachon University, Seognam-daero, Sujeong-gu, Seongnam-si 1342, Republic of Korea

Mathematicsjournal2025en

ABI

Abstract

Facial expression recognition (FER) is crucial for affective computing and human–computer interaction; however, it is still difficult to achieve under various conditions in the real world, such as lighting, occlusion, and pose. This work presents a lightweight hybrid network, SE-Hybrid + Face-ViT, which merges convolutional and transformer architectures through multi-level feature fusion and adaptive channel attention. The network includes a convolutional stream to capture the fine-grained texture of the image and a retrained Face-ViT branch to provide the high-level semantic context. Squeeze-and-Excitation (SE) modules adjust the channel responses at different levels, thus allowing the network to focus on the emotion-salient cues and suppress the redundant features. The proposed architecture, trained and tested on the large-scale AffectNet benchmark, achieved 70.45% accuracy and 68.11% macro-F1, thereby outperforming the latest state-of-the-art models such as TBEM-Transformer, FT-CSAT, and HFE-Net by around 2–3%. Grad-CAM-based visualization of the model confirmed accurate attention to the most significant facial areas, resulting in better recognition of subtle expressions such as fear and contempt. The findings indicate that SE-Hybrid + Face-ViT is a computationally efficient yet highly discriminative FER strategy that successfully addresses the issue of how to preserve details while globally reasoning with contextual information locally.

Topics

Emotion and Mood Recognition EEG and Brain-Computer Interfaces Face recognition and analysis

Identifiers

DOI: 10.3390/math14010085

Citations and references

Cited by 00 references

Metrics — AkademScholar · Coming soon