Automatically Determine Semantic Relations in the Uzbek Language Using Machine Learning
Annotatsiya
The automatic identification of semantic relations in the Uzbek language is a challenging problem due to its agglutinative nature, complex morphology, and lack of large annotated text datasets. The primary goal of this work is to develop and evaluate machine learning approaches for identifying semantic relations (such as synonymy, antonymy, hyponymy, etc.) between Uzbek words. We utilize modern natural language processing algorithms, including static word embeddings (Word2Vec, FastText) and a contextual transformer model (BERT). Several models are trained and tested, and their performance is measured on semantic relation classification tasks. The experiments reveal that contextual embeddings significantly improve classification accuracy over static embedding models, albeit at the cost of higher computational resources and data requirements. The novelty of this work lies in applying state-of-the-art NLP models specifically to Uzbek and analyzing how the language’s unique features impact these models. Key weaknesses of current approaches—such as insufficient handling of complex morphology and polysemy—are identified, and we propose directions for improvement. The findings contribute to the development of Uzbek-language NLP tools for applications like machine translation, information retrieval, and conversational AI.