Article

Advancing Oceanology Studies in Karakalpak: A Named Entity Recognition Algorithmic Framework

Bahodir IbragimovUrgench State University,Department of Information Technologies,Urgench,UzbekistanAdina D. EgamberganovaUrgench branch of Tashkent University of Information Technologies named after Muhammad al-Khwarizmi,Urgench,UzbekistanSaida KhamraevaUrgench branch of Tashkent University of Information Technologies named after Muhammad al-Khwarizmi,Urgench,UzbekistanDiloram A. FattaxovaNational University of Uzbekistan named after Mirzo Ulugbek,Tashkent,UzbekistanZiyoda KasimovaNational University of Uzbekistan named after Mirzo Ulugbek,Tashkent,UzbekistanDildora K. KhudayberganovaNational University of Uzbekistan named after Mirzo Ulugbek,Tashkent,Uzbekistan

2024en

ABI

Abstract

This paper presents an algorithm for recognizing named entities in texts written in the Karakalpak language related to the field of oceanology. The algorithm is based on the dictionary approach using a database of dictionaries of marked words. The total number of words in the database reaches 10,500, where there are 1,000 named entities related to oceanology. The article also describes a method for morphological analysis of undetected words using affixes embedded in the algorithm. The developed algorithm was tested on three text corpora, consisting of a total of 300 sentences. The testing results demonstrated high rates, in particular, the percentage of accuracy and recall varies from 91 % to 100%. In addition, the authors conducted research on similar scientific works, studied alternative or similar solutions that could fully or partially solve the problem. Moreover, for the most complete understanding of the presented material, as well as the problem under consideration, the authors included information on the Karakalpak language.

Topics

Data Quality and Management Topic Modeling Data Mining Algorithms and Applications

Identifiers

DOI: 10.1109/piere62470.2024.10804978

Citations and references

Cited by 14 14 references

Metrics — AkademScholar · Coming soon