Asosiy kontentga oʻtish
AkademIndex

Mahsulotlar

Ishlab chiquvchilar uchun

AkademBaseEkotizim uchun ochiq API
Maqola

Emotion-Aware Speaker Diarization Based on Prosodic and Deep Embedding Integration

Kamoliddin ShukurovTUIT named after Mukhammad al-Khwarizmi,Department of robotics and intelligent systems,Tashkent,UzbekistanU U KhasanovTUIT named after Mukhammad al-Khwarizmi,Department of robotics and intelligent systems,Tashkent,UzbekistanShokhrukhmirzo KholdorovTUIT named after Mukhammad al-Khwarizmi,Department of robotics and intelligent systems,Tashkent,UzbekistanMaftuna KarimovaTUIT named after Mukhammad al-Khwarizmi,Department of robotics and intelligent systems,Tashkent,UzbekistanLutfulla MurodjonovTUIT named after Mukhammad al-Khwarizmi,Department of robotics and intelligent systems,Tashkent,Uzbekistan
2025
ABI

Annotatsiya

Speaker diarization is the process of identifying speech segments in an audio stream and assigning them to a specific speaker. Since classical systems do not take into account prosodic features, their accuracy decreases in emotional speech situations. In this study, an emotion-sensitive speaker diarization system is proposed. In the model, prosodic vectors derived from prosodic features are combined with the embeddings of the ECAPA-TDNN model in a modulation manner. In the study, the emotion-sensitive speaker diarization model reduced the DER performance of the simple baseline model from 11.6 % to 7.9 %. In addition, it has low computational costs and provides significant results in real-time systems.

Hali tarjima qilinmagan

Mavzular

Identifikatorlar

Iqtiboslar va manbalar

Koʻrsatkichlar — AkademScholar · Tez orada