Maqola

Multimodal Emotion Recognition in English Conversations Using Fusion of NLP and Computer Vision Techniques

A. Venu Gopal ReddySiddhartha Academy of Higher Education (SAHE) Deemed to be University,Department of English,Vijayawada,Andhrapradesh,IndiaRashmi BVemana Institute of technology,Department of ECE,Bengaluru,Karnataka,India,560034Dilfuza GulyamovaUniversity of Information Technologies named after Muhammad al-Khwarizmi,Computer Engineering Department,Tashkent,UzbekistanG. S. BansodeKL (Deemed to Be) University (KLEF),Department of English,Guntur,Andhra Pradesh,IndiaPavan Kumar NowbattulaAnuradha. S

2026

ABI

Annotatsiya

Efficient emotion reading is vital in suitably human-computer interaction. This study will offer a multimodal solution through the latest NLP and computer vision in order to identify emotions in English conversation. RoBERTa is used to understand linguistic meaning and context whereas any CNN trained on AffectNet data is used to detect facial expressions. Coherent preprocessing makes the data format and conversation frame CRM. Late-fusion strategy, which has weighted streams and meta-classifier, is a highly effective way of enhancing the reliability of prediction. Multimodal MELD system is 98.7 % but is more accurate compared to the text only and vision only models. These findings indicate the great potential of multimodal learning to emotion-sensitive virtual assistants, affective computing, and intelligent interaction systems.

Hali tarjima qilinmagan

Mavzular

Emotion and Mood Recognition Speech and dialogue systems Subtitles and Audiovisual Media

Identifikatorlar

DOI: 10.1109/icaect68478.2026.11426023

Iqtiboslar va manbalar

0 ta iqtibos13 ta foydalanilgan manba

Koʻrsatkichlar — AkademScholar · Tez orada