Article

Bridging Legal and Medical Texts Through a Dual-Pipeline Classification Approach in Karakalpak Texts

Davlatyor MenglievCyber University,Nurafshon,UzbekistanШахноза Буриевна НашироваKarshi State University,Karshi,UzbekistanUmida KhudaybergenovaNina ROGOZINNIKOVANational University of Uzbekistan Named After Mirzo Ulugbek,Tashkent,UzbekistanRashid ZohidovTashkent State University of Uzbek Language and Literature Named After Alisher Navai,Tashkent,UzbekistanG. RakhimovaUrgench State University,Urgench,Uzbekistan

2025

ABI

Abstract

This article presents the results of developing and testing three models for working with texts in the Karakalpak language. To implement the task, namely, classifying texts into two target domains (between medical and legal), a model such as logistic regression was used, which allows for a fairly optimal implementation of the task. In addition, for each of the abovementioned domains, individual language models were prepared based on bidirectional LSTM. These models are designed to identify named entities in texts, and each of the models is trained on a specific genre of texts in order to increase the chances of successful text analysis. At the same time, the authors organized a comparative analysis of existing scientific works to reflect the relevance of the current work. Moreover, the authors added information that contains all the necessary information about the Karakalpak language, which can help understand the operation of the algorithm with texts in this language.

Topics

Text and Document Classification Technologies Topic Modeling Advanced Computational Techniques in Science and Engineering

Identifiers

DOI: 10.1109/apeie66761.2025.11289284

Citations and references

Cited by 017 references

Metrics — AkademScholar · Coming soon