Article

UzTreebank: Methodological and Practical Issues in Building a Syntactic Treebank for the Uzbek Language

Eşref AdalıIstanbul Technical University,Computer Engineering and Informatics Faculty,Istanbul,TürkiyeAbdullayeva Oqila Xolmo‘minovnaTashkent State University of Uzbek Language and Literature Named Alisher Navo’i,Dept. of Computational Linguistics and Digital Technologies,Tashkent,Uzbekistan

2025

ABI

Abstract

In recent years, syntactic and semantic analysis tools have become increasingly important in various subfields of Natural Language Processing (NLP). These tools enable automatic parsing of large-scale sentences in language corpora, allowing researchers to uncover syntactic structures and statistical regularities of a given language. This study focuses on the development and evaluation of syntactic parsing models for the Uzbek language, employing two widely used approaches: constituency parsing and dependency parsing. For constituency parsing, a rule-based system was developed to identify noun and verb phrases along with their internal constituents. For dependency parsing, a set of hand-crafted linguistic rules was created and applied to syntactically analyze simple Uzbek sentences. As a result of this work, a dependency-based syntactic treebank for Uzbek-Named UzTreebank was constructed. The treebank includes 20,000 automatically parsed simple sentences, of which 10,000 were manually annotated. Additionally, 36 syntactic templates of simple sentences were identified, and 50 linguistic rules were formalized and integrated into the system. The suboptimal performance of the system at its current stage is primarily attributed to the absence of hybrid modeling approaches and the limited size of the training corpus. The paper presents an overview of the rule-based architecture, parsing results, and the current stage of syntactic resource development for the Uzbek language.

Topics

Education, Innovation and Language Studies

Identifiers

DOI: 10.1109/ubmk67458.2025.11206779

Citations and references

Cited by 07 references

Metrics — AkademScholar