The Social Arc: A Memory-Augmented Graph Network for Multimodal Interaction Understanding
Annotatsiya
Understanding human emotion in conversation is a complex task that requires interpreting not just the multimodal cues of a single utterance, but also the broader conversational context. Most existing models fail to capture the long-term, dynamic history of multi-party interactions, treating speakers or utterances in isolation. To address this gap, we propose SocialArcNet, a novel architecture that explicitly models the social arc of a conversation. Our model integrates powerful unimodal backbones with a recurrent Graph Neural Network (GNN) that functions as a social memory. By maintaining and updating a hidden state for each speaker as a distinct node in the graph, SocialArcNet tracks the evolving affective trajectory of each participant. We demonstrate the effectiveness of our approach, that achieves a competitive weighted F1-score of 0.62, on the MELD dataset, outperforming current baselines. Our results validate that modeling the dynamic speaker state is a crucial strategy for contextual emotion recognition. Furthermore, we highlight the critical role of advanced loss functions and regularization in overcoming the severe class imbalance and overfitting challenges inherent in this domain. Our code available at https://github.com/multi-modal-rtm/Multimodal-Social-GNN.git