Мақола

Speech Emotion Recognition Using Machine Learning and Deep Learning Methods

Sohidul IslamPort City International University,Department of CSE,Chattogram,BangladeshManoara BegumPort City International University,Department of CSE,Chattogram,BangladeshMd Akash RahmanPort City International University,Department of CSE,Chattogram,BangladeshTanjim MahmudRangamati Science and Technology University,Department of CSE,Rangamati,BangladeshTursunova ShaxnozaUrgench branch of Tashkent University of Information Technologies named after Muhammad al-Khwarizmi Urgench,UzbekistanSindor SapaevUrgench state university Urgench,Dept. of CSE,UzbekistanSapayev Valisher Odilbek UgluAbubokor HanipMohammad Shahadat HossainPort City International University,Department of CSE,Chattogram,Bangladesh

2025en

ABI

Аннотация

Speech Emotion Recognition (SER) plays a crucial role in human-computer interaction, enabling systems to interpret and respond to human emotions. It has gained significant attention in recent years due to its applications in areas such as healthcare, virtual assistants, and affective computing. However, Bangla, despite being the seventh most spoken language globally, remains a low-resource language for SER due to the lack of publicly available datasets. This study aims to address this gap by developing an efficient SER system using the SUBESCO dataset, a Bangla emotional speech corpus. The proposed approach involves preprocessing audio data by removing noise and silence, followed by MFCC-based feature extraction to capture essential emotional patterns. Both machine learning (DT, KNN, MLP, SVM) and deep learning (ANN, CNN, LSTM) models were trained and evaluated to classify emotional states from speech. The effectiveness of each model was assessed through extensive experimentation. Results demonstrate that the proposed method achieves state-of-the-art performance, attaining a validation accuracy of 92.50 % with a validation loss of 24.06%. The outcomes of this research emphasize the robustness of the proposed methodology and its effectiveness in recognizing emotions from Bangla speech data.

Ҳали таржима қилинмаган

Мавзулар

Speech Recognition and Synthesis Speech and Audio Processing Emotion and Mood Recognition

Идентификаторлар

DOI: 10.1109/icict64420.2025.11005193

Иқтибослар ва манбалар

0 та иқтибос50 та фойдаланилган манба

Кўрсаткичлар — AkademScholar