Machine Learning for Uzbek Language Syntactic Analysis: A Review and Comparative Experiment
Abstract
The article discusses approaches to automatic syntactic analysis of Uzbek texts using statistical methods of machine learning - Naive Bayes, support vector machines and linear regression. The emphasis is on the specifics of the Uzbek language as an agglutinative language, which requires greater attention to morphological analysis and flexible word order. The proposed methods demonstrate stable results with limited volumes of labeled data and relatively low costs of computing resources. The study includes an analysis of the accuracy of determining syntactic dependencies, a description of the corpus preparation and data labeling process, as well as recommendations for further improvement of algorithms and expansion of the experimental base. The results can be used to develop full-featured machine translation systems, automatic correction of grammatical errors and other applications related to the processing of Uzbek written speech. Besides, authors conducted additional research for comparative analysis of existing solutions, which helps to determine actuality of the work.