Статья

Multimodal transformer augmented fusion for speech emotion recognition

Yuanyuan WangSchool of Artificial Intelligence, Xidian University, Xi'an, ChinaYu GuSchool of Artificial Intelligence, Xidian University, Xi'an, ChinaYifei YinGuangzhou Huya Technology Co., Ltd., Guangzhou, ChinaYingping HanSchool of Artificial Intelligence, Xidian University, Xi'an, ChinaHe ZhangSchool of Journalism and Communication, Northwest University, Xi'an, ChinaShuang WangSchool of Artificial Intelligence, Xidian University, Xi'an, ChinaChenyu LiSchool of Artificial Intelligence, Xidian University, Xi'an, ChinaDou QuanSchool of Artificial Intelligence, Xidian University, Xi'an, China

2023en

ABI

Аннотация

Speech emotion recognition is challenging due to the subjectivity and ambiguity of emotion. In recent years, multimodal methods for speech emotion recognition have achieved promising results. However, due to the heterogeneity of data from different modalities, effectively integrating different modal information remains a difficulty and breakthrough point of the research. Moreover, in view of the limitations of feature-level fusion and decision-level fusion methods, capturing fine-grained modal interactions has often been neglected in previous studies. We propose a method named multimodal transformer augmented fusion that uses a hybrid fusion strategy, combing feature-level fusion and model-level fusion methods, to perform fine-grained information interaction within and between modalities. A Model-fusion module composed of three Cross-Transformer Encoders is proposed to generate multimodal emotional representation for modal guidance and information fusion. Specifically, the multimodal features obtained by feature-level fusion and text features are used to enhance speech features. Our proposed method outperforms existing state-of-the-art approaches on the IEMOCAP and MELD dataset.

Перевод пока недоступен

Идентификаторы

DOI: 10.3389/fnbot.2023.1181598

Цитирования и источники

Цитирований: 2Использованных источников: 0

Показатели — AkademScholar