Multimodal Emotion Recognition in English Conversations Using Fusion of NLP and Computer Vision Techniques
Аннотация
Efficient emotion reading is vital in suitably human-computer interaction. This study will offer a multimodal solution through the latest NLP and computer vision in order to identify emotions in English conversation. RoBERTa is used to understand linguistic meaning and context whereas any CNN trained on AffectNet data is used to detect facial expressions. Coherent preprocessing makes the data format and conversation frame CRM. Late-fusion strategy, which has weighted streams and meta-classifier, is a highly effective way of enhancing the reliability of prediction. Multimodal MELD system is 98.7 % but is more accurate compared to the text only and vision only models. These findings indicate the great potential of multimodal learning to emotion-sensitive virtual assistants, affective computing, and intelligent interaction systems.
Ҳали таржима қилинмаган