Перейти к основному содержанию
AkademIndex

Продукты

Для разработчиков

AkademBaseОткрытый API экосистемы
Обзорная статья

Accent Classification Using Machine Learning Techniques: A Review

Sarah JassimDepartment of Computer Sciences, College of Science, University of BaghdadHusam Ali AbdulmohsinDepartment of Computer Sciences, College of Science, University of Baghdad
2025en
ABI

Аннотация

Accent is a person's distinct manner of speaking a particular language. It dramatically influences communication by producing pronunciation variations, which makes it challenging for automatic speech recognition (ASR) systems to understand spoken language accurately. The growing need for more accurate speech recognition technology means that improving machines' capability to classify and recognize accents becomes an essential challenge in speech processing. In response to this problem, this paper reviews previous studies on accent classification models. It discusses the principal methodologies used in this research, including datasets, preprocessing techniques, feature extraction, evaluation metrics and classification methods based on traditional machine learning (TML) and deep learning (DL) techniques utilized for accent recognition. The review includes journal articles and conference proceedings published between 2015 and 2025, emphasizing recent years. Relevant articles were sourced from leading academic databases and platforms, including Scopus, IEEE, Springer, MDPI, Google Scholar, and ResearchGate. The study concludes by identifying key research gaps and proposing future directions to advance accent recognition systems, offering valuable guidance for addressing current challenges and exploring innovative methodologies. A comparative analysis shows that the k-NN is the most effective traditional machine learning (TML) classifier. Among DL models, the pre-trained xResNet18 model outperforms other deep learning (DL) models when applied to well-structured English accent datasets while CNN achieves higher accuracy for datasets with diverse English accents but relatively small dataset sizes. Additionally, the fine-tuned transformer Wav2Vec2 achieves higher overall accuracy using a balanced and diverse dataset of six English accents, demonstrating strong performance in raw audio-based accent classification.

Перевод пока недоступен

Идентификаторы

Цитирования и источники

Цитирований: 2Использованных источников: 0