Development of Named Entity Recognition Model for Analysis of Oceanographic Texts in Uzbek Language
Abstract
This paper presents the development of a language model for recognizing named entities in Uzbek-language texts on oceanology and navigation. The study included a corpus of 5,000 sentences related to oceanology. These sentences contained more than 33,000 manually annotated words. The BIOES scheme was used to label the data, which allowed labeling both single-word entities and entire phrases. The trained model demonstrated effectiveness in recognizing entities such as geographic features, natural phenomena, vehicles, etc. The accuracy of the model when analyzing test texts was 88%, and the recall was 94%. Despite these results, the model showed a decrease in accuracy when analyzing texts from other areas, indicating the need for further improvement. In addition, the authors also conduct a comparative analysis with existing scientific research in this area to create a more relevant solution to the problem. The article discusses the prospects for improving the model and expanding the scope of its application.