Статья

Algorithms for Parsing Roots and Stems of Words in Uzbek Language

Elov Botir BoltayeviçXusainova Zilola YuldashevnaUmirova Svetlana MamurjonovnaNormamatov SuUonbek ErmamatovichA.Şahlo Abdimurod KızıMahmadiev Shavkatjon

2024en

ABI

Аннотация

When determining the roots of words in the Uzbek language, the following problems may arise: homonymity of the root and the suffix with the same root; the meeting of a word with a sound change; stemming new words and NER (Named entity recognition). The performance indicator presented in this article is equal to 97.5% UzbStemming algorithm solves the above problems and can be used in the development of NLP applications such as a spelling checker, machine translation, question-answer systems, syntactic and semantic analyzers. Lemming - as the process of determining the canonical forms of words in the text, is an important component of any NLP system and an important processing step for most applications based on natural language understanding. Lemming operation provides a more accurate representation of the meaning of a word than the process of rooting. This article presents methods and algorithms for the processing of various Uzbek words, idioms, NER (famous name), neologisms, lemming abbreviations.

Перевод пока недоступен

Идентификаторы

DOI: 10.1109/ubmk63289.2024.10773598

Цитирования и источники

Цитирований: 4Использованных источников: 0

Показатели — AkademScholar