Article

Oghuz Dialect Analysis of the Uzbek Language: Methodological Approach and Experimental Study

Saidbek P. BabayazovCyber University,Nurafshon,UzbekistanSh. Kh. IsmoilovUrgench Innovation University,Urgench,UzbekistanObidjan MadaminovUrgench State University,Urgench,UzbekistanUmidbek P. BabayazovUrgench State University,Urgench,UzbekistanNafisa RuzimovaDilfuza XajiyevaUrgench State Pedagogical Institute,Urgench,Uzbekistan

2025

ABI

Abstract

In this research, the authors present a relatively simple to implement yet effective detector for the Oghuz dialect of Uzbek. The method is compatible with standard natural language preprocessing, specifically normalization, tokenization, and spelling-aware regular expressions. Furthermore, a carefully selected set of diagnostic features (euklama enhancers, connectors, and auxiliary particles) is used for text analysis. We evaluate texts by normalizing the total number of pattern matches by the number of tokens and apply a single, adjustable threshold to distinguish dialectal from standard Uzbek. With stratified development, the rule-based system provides strong separability with a practical operating point. At the same time, it delivers high precision and recall, where the addition of a TF-IDF + logistic regression layer provides a small boost in edge cases while maintaining interpretability and low computational cost. A detailed error analysis identifies key error types—interscript variation, colloquial overlaps, and the handling of multi-word/clitic fragments—and motivates targeted corrections to normalization, matching constraints, and MWU rules. In addition to classification, the inventory supports corpus formation and training by providing pattern-based diagnostics, facilitating gradual refinement and major updates to context coders as needed.

Topics

Authorship Attribution and Profiling Linguistics and Cultural Studies Natural Language Processing Techniques

Identifiers

DOI: 10.1109/apeie66761.2025.11289360

Citations and references

Cited by 015 references

Metrics — AkademScholar · Coming soon