Development of a lexical dataset and a rule-based algorithm for the analysis of Khorezm dialects of the Uzbek language
Annotatsiya
As part of the study, a dataset was developed that contains dialect words of the Uzbek language of the Oguz form. The lexical dictionary published under the supervision of the Uzbek scientist F. Abdullaev was used as a source. Despite the fact that this dictionary was published in the last century, all the words and terms are actively used today. The Oguz lexicon of the Uzbek language dominates in the Khorezm region of Uzbekistan, where the number of speakers of this dialect reaches almost 2 million people. Additional relevance of the work is added by the fact that this dialect is also widespread in the neighboring region, namely in the Tashkhauz region of the Republic of Turkmenistan. The dataset has the following parameters: dialect words in Cyrillic and Latin, English translation and formal equivalent of each word form, as well as the region of application of each dialect word.