Статья

Detecting Allusions in the Karakalpak Language Using mBERT

Davlatyor MenglievCyber University,Nurafshon,UzbekistanRaima ShirinovaNational University of Uzbekistan Named After Mirzo Ulugbek,Tashkent,UzbekistanT. KhudayberganovUrgench State University,Urgench,UzbekistanShaxnoza AbdukayumovaAndijan State University,Andijan,UzbekistanXilola YunusovaNational University of Uzbekistan Named After Mirzo Ulugbek,Tashkent,UzbekistanIntizor DjumaniyazovaMamun University in Khiva,Khiva,Uzbekistan

2025

ABI

Аннотация

This paper investigates the problem of allusion detection in Karakalpak texts using neural network technologies such as mBERT. Although this problem has been studied well enough for such world languages as English, Russian, Chinese, etc., there are almost no studies for low-resource languages. The proposed solution includes not only the preparation of a language model, but also the formation of a corpus of literary texts from more than 5,000 sentences. To prevent overfitting, an early stopping mechanism was used, which allowed us to identify the most optimal model indicators. Empirical results on two different test sets show that the model works reliably on literary texts, but demonstrates a noticeable drop in performance when working with texts of various topics. In addition, a comparative analysis of existing solutions is carried out in order to emphasize the relevance of the work done. Moreover, the authors note that they plan to expand the dataset with a variety of literary topics, as well as informal genre texts, in order to further actualize the developed solution in applied.

Темы

Authorship Attribution and Profiling Topic Modeling Handwritten Text Recognition Techniques

Идентификаторы

DOI: 10.1109/apeie66761.2025.11289215

Цитирования и источники

Цитирований: 0Использованных источников: 11

Показатели — AkademScholar · Скоро