Asosiy kontentga oʻtish
AkademIndex

Mahsulotlar

Ishlab chiquvchilar uchun

AkademBaseEkotizim uchun ochiq API
Maqola

UzTreebank: Methodological and Practical Issues in Building a Syntactic Treebank for the Uzbek Language

Eşref AdalıIstanbul Technical University,Computer Engineering and Informatics Faculty,Istanbul,TürkiyeAbdullayeva Oqila Xolmo‘minovnaTashkent State University of Uzbek Language and Literature Named Alisher Navo’i,Dept. of Computational Linguistics and Digital Technologies,Tashkent,Uzbekistan
2025
ABI

Annotatsiya

In recent years, syntactic and semantic analysis tools have become increasingly important in various subfields of Natural Language Processing (NLP). These tools enable automatic parsing of large-scale sentences in language corpora, allowing researchers to uncover syntactic structures and statistical regularities of a given language. This study focuses on the development and evaluation of syntactic parsing models for the Uzbek language, employing two widely used approaches: constituency parsing and dependency parsing. For constituency parsing, a rule-based system was developed to identify noun and verb phrases along with their internal constituents. For dependency parsing, a set of hand-crafted linguistic rules was created and applied to syntactically analyze simple Uzbek sentences. As a result of this work, a dependency-based syntactic treebank for Uzbek-Named UzTreebank was constructed. The treebank includes 20,000 automatically parsed simple sentences, of which 10,000 were manually annotated. Additionally, 36 syntactic templates of simple sentences were identified, and 50 linguistic rules were formalized and integrated into the system. The suboptimal performance of the system at its current stage is primarily attributed to the absence of hybrid modeling approaches and the limited size of the training corpus. The paper presents an overview of the rule-based architecture, parsing results, and the current stage of syntactic resource development for the Uzbek language.

Hali tarjima qilinmagan

Mavzular

Identifikatorlar

Iqtiboslar va manbalar