Перейти к основному содержанию
AkademIndex

Продукты

Для разработчиков

AkademBaseОткрытый API экосистемы
Статья

Emotion-Aware Speaker Diarization Based on Prosodic and Deep Embedding Integration

Kamoliddin ShukurovTUIT named after Mukhammad al-Khwarizmi,Department of robotics and intelligent systems,Tashkent,UzbekistanU U KhasanovTUIT named after Mukhammad al-Khwarizmi,Department of robotics and intelligent systems,Tashkent,UzbekistanShokhrukhmirzo KholdorovTUIT named after Mukhammad al-Khwarizmi,Department of robotics and intelligent systems,Tashkent,UzbekistanMaftuna KarimovaTUIT named after Mukhammad al-Khwarizmi,Department of robotics and intelligent systems,Tashkent,UzbekistanLutfulla MurodjonovTUIT named after Mukhammad al-Khwarizmi,Department of robotics and intelligent systems,Tashkent,Uzbekistan
2025
ABI

Аннотация

Speaker diarization is the process of identifying speech segments in an audio stream and assigning them to a specific speaker. Since classical systems do not take into account prosodic features, their accuracy decreases in emotional speech situations. In this study, an emotion-sensitive speaker diarization system is proposed. In the model, prosodic vectors derived from prosodic features are combined with the embeddings of the ECAPA-TDNN model in a modulation manner. In the study, the emotion-sensitive speaker diarization model reduced the DER performance of the simple baseline model from 11.6 % to 7.9 %. In addition, it has low computational costs and provides significant results in real-time systems.

Перевод пока недоступен

Темы

Идентификаторы

Цитирования и источники

Показатели — AkademScholar · Скоро