Bridging Legal and Medical Texts Through a Dual-Pipeline Classification Approach in Karakalpak Texts
Annotatsiya
This article presents the results of developing and testing three models for working with texts in the Karakalpak language. To implement the task, namely, classifying texts into two target domains (between medical and legal), a model such as logistic regression was used, which allows for a fairly optimal implementation of the task. In addition, for each of the abovementioned domains, individual language models were prepared based on bidirectional LSTM. These models are designed to identify named entities in texts, and each of the models is trained on a specific genre of texts in order to increase the chances of successful text analysis. At the same time, the authors organized a comparative analysis of existing scientific works to reflect the relevance of the current work. Moreover, the authors added information that contains all the necessary information about the Karakalpak language, which can help understand the operation of the algorithm with texts in this language.
Hali tarjima qilinmagan