Статья

Development of an Algorithm for Linguistic Analysis of the Uzbek Texts

Bozorboy IslombekovCyber University,Nurafshon,UzbekistanGuli ToirovaBukhara State University,Bukhara,UzbekistanMokhira SaparovaBoburjon I. ShermatovUrgench State University,Urgench,UzbekistanFarida SolievaNational University of Uzbekistan Named After Mirzo Ulugbek,Tashkent,UzbekistanRuziboy SafarboyevJizzakh Polytechnic Institute,Department of Physics,Jizzakh,Uzbeksitan

2025

ABI

Аннотация

This paper presents a linguistic analysis algorithm for the Uzbek language that combines a rule-based morphological module with a sequential model of syntactic role annotation based on BiLSTM-CRF. The morphological analyzer performs normalization and tokenization, uses a lexicon of lemmas and an inventory of affixes with allomorphs, and checks the compatibility of affix chains in a finite-state machine. Furthermore, the algorithm also takes into account morphophonological alternations and handles exceptions, where residual ambiguity is removed by local rules and lightweight ranking. Furthermore, the syntactic model consumes morphological features (Case, TAM, Voice, Neg, key affix indicators, etc.) along with verbal and symbolic representations and predicts five functional roles (Ega, Kesim, To'ldiruvchi, Aniqlovchi, Hol) using the BIOES scheme. Experiments on a corpus of over 20,000 sentences with fixed splits demonstrate fairly good performance. The proposed approach combines interpretability and practicality for resource-constrained settings and can serve as the basis for scalable tools for processing Uzbek texts.

Темы

Text and Document Classification Technologies Natural Language Processing Techniques Topic Modeling

Идентификаторы

DOI: 10.1109/apeie66761.2025.11289446

Цитирования и источники

Цитирований: 0Использованных источников: 23

Показатели — AkademScholar · Скоро