Статья

Development of a lexical dataset and a rule-based algorithm for the analysis of Khorezm dialects of the Uzbek language

Nilufar AbdurakhmonovaNational University of Uzbekistan named after Mirzo Ulugbek, 4, University str., 100174 Tashkent city, UzbekistanGulnora AstanovaBukhara State University, 11, Iqbal str., Bukhara city 200100, UzbekistanAtoullo AkhmedovBukhara State University, 11, Iqbal str., Bukhara city 200100, UzbekistanDavlatyor MenglievComputer Sciences, Scientific Department, Cyber University, Nurafshon, UzbekistanBahodir IbragimovUrgench State University, 14, Kh.Alimdjan str., Urgench city 220100, UzbekistanAnvar AbdullayevComputer Sciences, Scientific Department, Cyber University, Nurafshon, Uzbekistan

Data in Briefjournal2025en

ABI

Аннотация

As part of the study, a dataset was developed that contains dialect words of the Uzbek language of the Oguz form. The lexical dictionary published under the supervision of the Uzbek scientist F. Abdullaev was used as a source. Despite the fact that this dictionary was published in the last century, all the words and terms are actively used today. The Oguz lexicon of the Uzbek language dominates in the Khorezm region of Uzbekistan, where the number of speakers of this dialect reaches almost 2 million people. Additional relevance of the work is added by the fact that this dialect is also widespread in the neighboring region, namely in the Tashkhauz region of the Republic of Turkmenistan. The dataset has the following parameters: dialect words in Cyrillic and Latin, English translation and formal equivalent of each word form, as well as the region of application of each dialect word.

Темы

Economic and Industrial Development Advanced Computational Techniques in Science and Engineering Education, Innovation and Language Studies

Идентификаторы

DOI: 10.1016/j.dib.2025.112216

Цитирования и источники

Цитирований: 0Использованных источников: 7

Показатели — AkademScholar · Скоро