Skip to main content
AkademIndex

Products

For developers

AkademBasesoonOpen API for the ecosystem
Article

Development of a Rule-based Model and Algorithm for Predicate Identification in Uzbek Language Texts

Maksud SharipovUrgench State University Named After Abu Rayhan Biruni,Dept. Computer Sciences,Urgench,UzbekistanIkhtiyor AvezmatovUrgench State University Named After Abu Rayhan Biruni,Dept. Computer Sciences,Urgench,UzbekistanHushnudbek S. AdinaevUrgench State University Named After Abu Rayhan Biruni,Dept. Computer Sciences,Urgench,Uzbekistan
2025
ABI

Abstract

This paper presents a rule-based approach for automatic predicate identification in Uzbek texts. The system encodes five core syntactic patterns that capture verbal-noun + modal constructions, personal endings on non-verbal tokens, auxiliary-verb combinations, and rich tense-mood affixation. These linguistically motivated rules consult two XML resources: mustaqil_fel.xml, which now contains 8,500 verb stems, and istisno.xml, which holds rule-specific exception sets. Evaluation on a 300-sentence gold-standard corpus balanced across literary, scientific, and conversational genres yielded an overall F1-score of 0.86. These results show that carefully crafted rules can provide high-confidence predicate detection for Uzbek and supply a transparent baseline for future hybrid or data-driven enhancements in Uzbek NLP. If carefully designed, such approaches can capture language-specific syntactic regularities with high precision and serve as valuable foundations for hybrid or statistical models. These methods remain a viable and interpretable solution, given the lack of large annotated corpora and the low-resource nature of Uzbek.

Topics

Identifiers

Citations and references

Metrics — AkademScholar · Coming soon