Статья

A novel dual-modal emotion recognition algorithm with fusing hybrid features of audio signal and speech context

Yurui XuAutomation School of Qingdao University, Institute of Future, Qingdao, ChinaHang SuDepartment of Electronics, Information and Bioengineering, Politecnico di Milano, 20133, Milan, ItalyGuijin MaAutomation School of Qingdao University, Institute of Future, Qingdao, ChinaXiaorui LiuAutomation School of Qingdao University, Institute of Future, Qingdao, China

2022en

ABI

Аннотация

Abstract With regard to human–machine interaction, accurate emotion recognition is a challenging problem. In this paper, efforts were taken to explore the possibility to complete the feature abstraction and fusion by the homogeneous network component, and propose a dual-modal emotion recognition framework that is composed of a parallel convolution (Pconv) module and attention-based bidirectional long short-term memory (BLSTM) module. The Pconv module employs parallel methods to extract multidimensional social features and provides more effective representation capacity. Attention-based BLSTM module is utilized to strengthen key information extraction and maintain the relevance between information. Experiments conducted on the CH-SIMS dataset indicate that the recognition accuracy reaches 74.70% on audio data and 77.13% on text, while the accuracy of the dual-modal fusion model reaches 90.02%. Through experiments it proves the feasibility to process heterogeneous information within homogeneous network component, and demonstrates that attention-based BLSTM module would achieve best coordination with the feature fusion realized by Pconv module. This can give great flexibility for the modality expansion and architecture design.

Идентификаторы

DOI: 10.1007/s40747-022-00841-3

Цитирования и источники

Цитирований: 3Использованных источников: 0

Показатели — AkademScholar · Скоро