Article

Computational Model of Morphology and Stemming of Uzbek Words on Complete Set of Endings

Ualsher TukeyevAl-Farabi Kazakh National University,Information systems department,Almaty,KazakhstanNargiza GabdullinaAl-Farabi Kazakh National University,Information systems department,Almaty,KazakhstanNazerke KaripbayevaAl-Farabi Kazakh National University,Information systems department,Almaty,KazakhstanNilufar AbdurakhmonovaNational University of Uzbekistan,Department of Computational and Applied Linguistics,Tashkent,UzbekistanTolganay BalabekovaAl-Farabi Kazakh National University,Information systems department,Almaty,KazakhstanAidana KaribayevaAl-Farabi Kazakh National University,Information systems department,Almaty,Kazakhstan

2024en

ABI

Abstract

The Uzbek language belongs to the Turkic-speaking group and is one of the low-resource languages. In this regard, increasing and expanding the language and electronic resources in the Uzbek language is essential. For many natural language processing (NLP) tasks, such as stemming, segmentation, and morphological analysis, a set of endings and stem and stop words are required. The article contains a complete set of Uzbek endings and a dictionary of stem and stop words. The endings were collected for two main parts of speech, that is, for the noun and the verb. The dictionary of verb endings includes all possible combinations of tenses, voices, moods, and participles. Using the collected linguistic resources, stemming programs for Uzbek texts were tested, problems were identified based on the experiment results, and the program was processed according to them. The results of the experiments using the developed linguistic resources of the Uzbek language showed an accuracy of 94.18% on average.

Topics

Education, Innovation and Language Studies Economic and Industrial Development Engineering and Agricultural Innovations

Identifiers

DOI: 10.1109/piere62470.2024.10805062

Citations and references

Cited by 8 7 references

Metrics — AkademScholar · Coming soon