Article

MorphUz: Morphological Analyzer for the Uzbek Language

Nilufar AbdurakhmonovaNational University of Uzbekistan,Uzbek linguistics department,Tashkent,UzbekistanIsmailov AlisherAndijan machine building institute,Innovative Educational department,Andijan,UzbekistanRano SayfulleyevaNational University of Uzbekistan,Uzbek linguistics department,Tashkent,Uzbekistan

2022 7th International Conference on Computer Science and Engineering (UBMK)conference2022en

ABI

Abstract

The Uzbek language is an agglutinative language in that words are derived from stems (root) by concatenating affixes. This property makes a large number of combinations of morphemes, and greatly increases the word-vocabulary size. Therefore, words are split into certain sub-word units and applied to text and speech processing applications. Proper sub-word units not only provide high coverage and smaller lexicon size, but also provide semantic and syntactic information that is necessary for downstream applications. This paper discusses a morphological analyzer tool for natural language processing and machine learning purpose. The tool named MorphUz, which can split a text of words into a sequence of morphemes. Morphological analyzer is one of the main part of the natural language processing. MorphUz analyzer is an open-source morphological analyzer for the Uzbek language. The MorphUz analyzer is available as a website for exploration. MorphUz analyzer implements the morphology of the Uzbek language following a two-level approach using stemming and suffix analyzer. The implementation of MorphUz analyzer done by using PHP and JavaScript scripts and MySQL database.

Topics

Natural Language Processing Techniques

Identifiers

DOI: 10.1109/ubmk55850.2022.9919579

Citations and references

Cited by 170 references

Metrics — AkademScholar