Maqola

Identification of Named Entities from Uzbek Historical Texts: A Multilingual BERT Approach

Nodirbek BoltayevUrgench State University,Urgench,UzbekistanMunisa NayimovaBukhara State University,Bukaha,UzbekistanН. П. АбубакироваAlisher AbidjanovCyber University,Nurafshon,UzbekistanKamolova MadinaSamarkand State Institute of Foreign Languages,Samarkand,UzbekistanSevara BerdimurotovaTermez University of Economics and Service,Termez,Uzbekistan

2025

ABI

Annotatsiya

This paper presents an algorithm for recognizing named entities in Uzbek historical texts dating back to 1928– 1940. To accomplish the task, we used the Multilingual BERT deep learning model, which was trained on a custom dataset. It should be noted that this dataset was formed from 5,500 sentences, each of which was annotated using the BIOES scheme. The authors argued that this annotation scheme was chosen because it is one of the most popular annotation schemes for named entity detection tasks. Organizations, persons, and locations were selected as categories of named entities. The model was trained using the early stopping mechanism, which allowed us to select the best metric weights obtained at the 11th training epoch. For an objective assessment, testing was conducted on various thematic historical texts and modern Uzbek texts, which once again confirmed the high efficiency of the model for historical data and revealed a significant decrease in accuracy on modern texts.

Mavzular

Topic Modeling Text and Document Classification Technologies Sentiment Analysis and Opinion Mining

Identifikatorlar

DOI: 10.1109/apeie66761.2025.11289339

Iqtiboslar va manbalar

0 ta iqtibos21 ta foydalanilgan manba

Koʻrsatkichlar — AkademScholar · Tez orada