Speech Emotion Recognition Using Machine Learning and Deep Learning Methods
Annotatsiya
Speech Emotion Recognition (SER) plays a crucial role in human-computer interaction, enabling systems to interpret and respond to human emotions. It has gained significant attention in recent years due to its applications in areas such as healthcare, virtual assistants, and affective computing. However, Bangla, despite being the seventh most spoken language globally, remains a low-resource language for SER due to the lack of publicly available datasets. This study aims to address this gap by developing an efficient SER system using the SUBESCO dataset, a Bangla emotional speech corpus. The proposed approach involves preprocessing audio data by removing noise and silence, followed by MFCC-based feature extraction to capture essential emotional patterns. Both machine learning (DT, KNN, MLP, SVM) and deep learning (ANN, CNN, LSTM) models were trained and evaluated to classify emotional states from speech. The effectiveness of each model was assessed through extensive experimentation. Results demonstrate that the proposed method achieves state-of-the-art performance, attaining a validation accuracy of 92.50 % with a validation loss of 24.06%. The outcomes of this research emphasize the robustness of the proposed methodology and its effectiveness in recognizing emotions from Bangla speech data.
Hali tarjima qilinmagan