Automatic Identification of Phrase Structures in Uzbek Texts
Аннотация
In recent years, various software systems have been developed in the Uzbek language to perform text processing tasks aimed at enhancing the effectiveness of machine translation, information retrieval, syntactic parsers, language learning tools, and lexicographic systems. In our study, we developed a program for the automatic identification of phrase structures, which represents one of the essential initial steps for building a syntactic parser for the Uzbek language. The availability of such a tool is crucial not only for accurate meaning extraction and syntactic analysis but also for solving alignment problems in translation systems. We investigated the syntactic templates of phrase structures in Uzbek and the patterns of dependency and head-modifier combinations. Based on this, we designed a rule-based system grounded in linguistic models and theoretical frameworks. The program was implemented using Python libraries and natural language processing (NLP) algorithms adapted for the Uzbek language. Uzbek phrase structures include governable, agglutinative, and concordant types. The system evaluates the effectiveness of identifying all three types of phrase structures in sentences, and the average accuracy of the program's performance was measured. A corpus of 100,000 Uzbek sentences was collected and analyzed. A rule-based approach was used to develop the rule set for phrase structure identification.
Ҳали таржима қилинмаган