Speech-to-Text Models in Uzbek Language: Achievements and Limitations
Annotatsiya
In the last few years, artificial intelligence and natural language processing (NLP) have changed the way people and computers interact in a big way. Nonetheless, creating strong Automatic Speech Recognition (ASR) systems for languages like Uzbek that don't have a lot of resources and are agglutinative is still a big problem. This paper offers a comprehensive examination of the present condition of Uzbek ASR technologies, grounded in research disseminated from 2020 to 2025. The research delineates significant constraints arising from the language's intricate morphological framework, considerable dialectal variation, and the deficiency of substantial annotated datasets. We look at different architectures, such as old-school DNN-HMM hybrids and newer End-to-End (E2E) models like Transformers and Conformers. Comparative results show that E2E-Conformer architectures combined with specialised language models (UzLM) work better, with a Word Error Rate (WER) of 13.9%. The results indicate that the creation of more extensive open-source corpora, the adoption of Self-Supervised Learning, and the application of multilingual transfer learning represent the most promising avenues for future progress in the Uzbek speech technology domain.