Article

Hybrid Approach to Genre Classification of Karakalpak Texts in Telecommunications and Energy Domains

Davlatyor MenglievCyber University,Nurafshon,UzbekistanGuli ToirovaBukhara State University,Bukhara,UzbekistanRayhon ErgashevaTermez University of Economics and Service,Termez,UzbekistanKhamidillo AtakhanovChirchik State Pedagogical University,Chirchik,UzbekistanZulfiya RustamovaTashkent State Pedagogical University,Tashkent,UzbekistanNodirbek BoltayevUrgench State University,Urgench,Uzbekistan

2025

ABI

Abstract

In this research paper, a hybrid approach combining rule-oriented preprocessing algorithms and modern neural network methods is proposed for classifying texts in the Karakalpak language. Telecommunications and energy were selected as the classified domains. It should be noted that at the preprocessing stage, texts are standardized from Cyrillic to Latin, which allows unifying the entire text for easier operation of the algorithm complex. Subsequently, the texts were classified using a neural network model built on the initially empty SpaCy model, which was trained on a dataset of 10,000 sentences. As a result of testing the model, it was found that the average F1 score was about 87%, while after applying postprocessing, the indicator increased and reached 95.1%, which confirms the effectiveness of the proposed approach. Besides, authors included existed and relevant researches in order to reflect actuality of the paper. Moreover, there are plenty of information about nature and morphology of the Karakalpak language which help readers to understand all difficulties about language processing.

Topics

linguistics and terminology studies Authorship Attribution and Profiling Text and Document Classification Technologies

Identifiers

DOI: 10.1109/apeie66761.2025.11289307

Citations and references

Cited by 014 references

Metrics — AkademScholar · Coming soon