Article

Development of a Rule-based Model and Algorithm for Predicate Identification in Uzbek Language Texts

Maksud SharipovUrgench State University Named After Abu Rayhan Biruni,Dept. Computer Sciences,Urgench,UzbekistanIkhtiyor AvezmatovUrgench State University Named After Abu Rayhan Biruni,Dept. Computer Sciences,Urgench,UzbekistanHushnudbek S. AdinaevUrgench State University Named After Abu Rayhan Biruni,Dept. Computer Sciences,Urgench,Uzbekistan

2025

ABI

Abstract

This paper presents a rule-based approach for automatic predicate identification in Uzbek texts. The system encodes five core syntactic patterns that capture verbal-noun + modal constructions, personal endings on non-verbal tokens, auxiliary-verb combinations, and rich tense-mood affixation. These linguistically motivated rules consult two XML resources: mustaqil_fel.xml, which now contains 8,500 verb stems, and istisno.xml, which holds rule-specific exception sets. Evaluation on a 300-sentence gold-standard corpus balanced across literary, scientific, and conversational genres yielded an overall F1-score of 0.86. These results show that carefully crafted rules can provide high-confidence predicate detection for Uzbek and supply a transparent baseline for future hybrid or data-driven enhancements in Uzbek NLP. If carefully designed, such approaches can capture language-specific syntactic regularities with high precision and serve as valuable foundations for hybrid or statistical models. These methods remain a viable and interpretable solution, given the lack of large annotated corpora and the low-resource nature of Uzbek.

Topics

Educational Technology and Assessment

Identifiers

DOI: 10.1109/ubmk67458.2025.11206891

Citations and references

Cited by 1 4 references

Metrics — AkademScholar · Coming soon