Статья

Automatic Identification of Phrase Structures in Uzbek Texts

Elov Botir BoltayevichTashkent State University of Uzbek Language and Literature Named Alisher Navo’i,Dept. of Computational Linguistics and Digital Technologies,Tashkent,UzbekistanAbdullayeva Oqila Xolmo‘minovnaTashkent State University of Uzbek Language and Literature Named Alisher Navo’i,Dept. of Computational Linguistics and Digital Technologies,Tashkent,UzbekistanAbdullayeva Nazokat IsayevnaUniversity of Information Technologies Named After Muhammad al-Khwarizmi,Dept. of Computer Systems The Samarkand Branch of Tashkent,Samarkand,UzbekistanSaodat IsrailovaNational University of Uzbekistan Named After Mirzo Ulugbek,Dept. of Uzbek Linguistics,Tashkent,UzbekistanNizomova Fotima Bo‘rixo‘ja QiziTashkent State University of Uzbek Language and Literature Named Alisher Navo’i,Dept. of Computational Linguistics and Digital Technologies,Tashkent,Uzbekistan

2025

ABI

Аннотация

In recent years, various software systems have been developed in the Uzbek language to perform text processing tasks aimed at enhancing the effectiveness of machine translation, information retrieval, syntactic parsers, language learning tools, and lexicographic systems. In our study, we developed a program for the automatic identification of phrase structures, which represents one of the essential initial steps for building a syntactic parser for the Uzbek language. The availability of such a tool is crucial not only for accurate meaning extraction and syntactic analysis but also for solving alignment problems in translation systems. We investigated the syntactic templates of phrase structures in Uzbek and the patterns of dependency and head-modifier combinations. Based on this, we designed a rule-based system grounded in linguistic models and theoretical frameworks. The program was implemented using Python libraries and natural language processing (NLP) algorithms adapted for the Uzbek language. Uzbek phrase structures include governable, agglutinative, and concordant types. The system evaluates the effectiveness of identifying all three types of phrase structures in sentences, and the average accuracy of the program's performance was measured. A corpus of 100,000 Uzbek sentences was collected and analyzed. A rule-based approach was used to develop the rule set for phrase structure identification.

Перевод пока недоступен

Темы

Natural Language Processing Techniques Lexicography and Language Studies Second Language Acquisition and Learning

Идентификаторы

DOI: 10.1109/ubmk67458.2025.11206816

Цитирования и источники

Цитирований: 0Использованных источников: 5

Показатели — AkademScholar