Статья

Development of a Domain-Specific Named Entity Recognition Model for Pedagogical and Valeological Uzbek Texts

Zamira YusupovaTashkent University of Information Technologies named after Muhammad al-Khwarizmi,Tashkent,UzbekistanRayhon SapaevaUrgench State University named after Abu Rayhon Beruni,Urgench,UzbekistanSarbinaz KazakbaevaCyber University,Nurafshon,Uzbekistan

2025

ABI

Аннотация

This paper presents a named entity recognition model for Uzbek texts on pedagogy and valeology. The authors compiled a custom dataset from multiple sources using the BIOES annotation scheme and an extended entity set, including general categories (persons, organizations, locations) and domain-specific terms. It should be noted that the dictionary is consist of over 5000 marked sentences and over 6000 named entity mentions in different categories. Conditional random field architectures, BiLSTM-CRF, and the multilingual mBERT model were used as comparative solutions. The results demonstrate consistent improvement when moving from classical to neural and transformative architectures, with the largest gains for multi-word domain terms and organization names. The proposed solution and annotated corpus enable reproducible experiments with low-resource agglutinative languages and provide a practical foundation for subsequent tasks such as semantic search, knowledge graph construction, and educational/medical information retrieval. Resources and implementation details are described to facilitate reuse and extension. The authors also described the features of the Uzbek language for solving the problem, including morphology, agglutinative properties, and others.

Перевод пока недоступен

Темы

Topic Modeling Advanced Graph Neural Networks Text Readability and Simplification

Идентификаторы

DOI: 10.1109/apeie66761.2025.11289365

Цитирования и источники

Цитирований: 0Использованных источников: 20

Показатели — AkademScholar · Скоро