A Morphological Tagging Model of the Uzbek Language in the Universal Dependencies Format
Аннотация
This study presents the development of a morphological tagging model and algorithm for the Uzbek language, structured according to the Universal Dependencies (UD) framework. Taking into account the agglutinative nature of Uzbek, morphotactic rules have been systematically defined for major word classes—including nouns, verbs, adjectives, numerals, adverbs, and pronouns—using deterministic finite automata (DFA) and context-free grammars. The model specifies the class-specific sequences of affixes, their formal mapping to the UD FEATS schema, and incorporates a mathematical representation alongside a segmentation algorithm. The resulting rule-based morphological analyzer supports lemmatization, affix segmentation, and automatic grammatical tagging in compliance with UD standards. The system has been validated with over 20 example words, for which the corresponding UPOS, FEATS, and lemma annotations have been generated. Furthermore, detailed analyses of UD FEATS mapping conventions—particularly those relating to tense, mood, voice, degree, and case—have been conducted for each word class. The proposed model offers a foundational morphological component for various applications, including the development of UD-compatible corpora for Uzbek, machine translation systems, language instruction tools, and syntactic parsers.