Статья

A Morphological Tagging Model of the Uzbek Language in the Universal Dependencies Format

Elov Botir BoltayevichTashkent State University of Uzbek Language and Literature Named Alisher Navo’i,Dept. of Computational Linguistics and Digital Technologies,Tashkent,UzbekistanKhamroeva Shahlo MirdjonovnaTashkent State University of Uzbek Language and Literature Named Alisher Navo’i,Dept. of Computational Linguistics and Digital Technologies,Tashkent,UzbekistanKurbanova Mukhabbat MatyakubovnaNational University of Uzbekistan,Department of Uzbek Linguistics,Tashkent,UzbekistanXusainova Zilola YuldashevnaTashkent State University of Uzbek Language and Literature,Computer Linguistics and Digital Technology,Tashkent,UzbekistanAlavutdinova Nadira GaniyevnaNational University of Uzbekistan Named Mirzo Ulugbek,Department of Uzbek Philology,Tashkent,Uzbekistan

2025

ABI

Аннотация

This study presents the development of a morphological tagging model and algorithm for the Uzbek language, structured according to the Universal Dependencies (UD) framework. Taking into account the agglutinative nature of Uzbek, morphotactic rules have been systematically defined for major word classes—including nouns, verbs, adjectives, numerals, adverbs, and pronouns—using deterministic finite automata (DFA) and context-free grammars. The model specifies the class-specific sequences of affixes, their formal mapping to the UD FEATS schema, and incorporates a mathematical representation alongside a segmentation algorithm. The resulting rule-based morphological analyzer supports lemmatization, affix segmentation, and automatic grammatical tagging in compliance with UD standards. The system has been validated with over 20 example words, for which the corresponding UPOS, FEATS, and lemma annotations have been generated. Furthermore, detailed analyses of UD FEATS mapping conventions—particularly those relating to tense, mood, voice, degree, and case—have been conducted for each word class. The proposed model offers a foundational morphological component for various applications, including the development of UD-compatible corpora for Uzbek, machine translation systems, language instruction tools, and syntactic parsers.

Темы

Education, Innovation and Language Studies Economic and Industrial Development

Идентификаторы

DOI: 10.1109/ubmk67458.2025.11207015

Цитирования и источники

Цитирований: 0Использованных источников: 3

Показатели — AkademScholar · Скоро