Skip to main content
AkademIndex

Products

For developers

AkademBasesoonOpen API for the ecosystem
Latin
Article

Computational Model of Morphology and Stemming of Uzbek Words on Complete Set of Endings

Ualsher TukeyevAl-Farabi Kazakh National University,Information systems department,Almaty,KazakhstanNargiza GabdullinaAl-Farabi Kazakh National University,Information systems department,Almaty,KazakhstanNazerke KaripbayevaAl-Farabi Kazakh National University,Information systems department,Almaty,KazakhstanNilufar AbdurakhmonovaNational University of Uzbekistan,Department of Computational and Applied Linguistics,Tashkent,UzbekistanTolganay BalabekovaAl-Farabi Kazakh National University,Information systems department,Almaty,KazakhstanAidana KaribayevaAl-Farabi Kazakh National University,Information systems department,Almaty,Kazakhstan
2024en
ABI

Abstract

The Uzbek language belongs to the Turkic-speaking group and is one of the low-resource languages. In this regard, increasing and expanding the language and electronic resources in the Uzbek language is essential. For many natural language processing (NLP) tasks, such as stemming, segmentation, and morphological analysis, a set of endings and stem and stop words are required. The article contains a complete set of Uzbek endings and a dictionary of stem and stop words. The endings were collected for two main parts of speech, that is, for the noun and the verb. The dictionary of verb endings includes all possible combinations of tenses, voices, moods, and participles. Using the collected linguistic resources, stemming programs for Uzbek texts were tested, problems were identified based on the experiment results, and the program was processed according to them. The results of the experiments using the developed linguistic resources of the Uzbek language showed an accuracy of 94.18% on average.

Topics

Identifiers

Citations and references

Metrics — AkademScholar · Coming soon