Article

Development of Named Entity Recognition Model for Analysis of Oceanographic Texts in Uzbek Language

Davlatyor MenglievNovosibirsk State University,Novosibirsk,RussiaNilufar AbdurakhmonovaNational University of Uzbekistan Named after Mirzo Ulugbek Tashkent,Department of Computer Linguistics,UzbekistanVladimir BarakhninFederal Research Center for Information and Computational Technologies,Novosibirsk,RussiaGavhar I. KuvondikovaNational University of Uzbekistan Named after Mirzo Ulugbek,Tashkent,UzbekistanZebo G. KadirovaNational University of Uzbekistan Named after Mirzo Ulugbek,Tashkent,UzbekistanBahodir IbragimovUrgench State University,Urgench,Uzbekistan

2024en

ABI

Abstract

This paper presents the development of a language model for recognizing named entities in Uzbek-language texts on oceanology and navigation. The study included a corpus of 5,000 sentences related to oceanology. These sentences contained more than 33,000 manually annotated words. The BIOES scheme was used to label the data, which allowed labeling both single-word entities and entire phrases. The trained model demonstrated effectiveness in recognizing entities such as geographic features, natural phenomena, vehicles, etc. The accuracy of the model when analyzing test texts was 88%, and the recall was 94%. Despite these results, the model showed a decrease in accuracy when analyzing texts from other areas, indicating the need for further improvement. In addition, the authors also conduct a comparative analysis with existing scientific research in this area to create a more relevant solution to the problem. The article discusses the prospects for improving the model and expanding the scope of its application.

Topics

Topic Modeling

Identifiers

DOI: 10.1109/ictacs62700.2024.10840741

Citations and references

Cited by 2 15 references

Metrics — AkademScholar · Coming soon