Developing Rule-Based and Gazetteer Lists for Named Entity Recognition in Uzbek Language: Geographical Names
Аннотация
Named entity recognition is one of the current problems in the field of natural language processing, especially for languages with limited resources, where the Uzbek language is one of them. In this scientific work, the authors conduct a review study and also propose a practical solution for recognizing named entities in the Uzbek language. The proposed approach is designed and operated on the basis of gazetteer rules and lists, with special emphasis on the identification of place names. The authors do not use machine learning methods due to limited linguistic resources, which makes the proposed approach easy to implement and fast to scale.The cornerstone of the proposed approach is two algorithms, where the first algorithm, based on morphological analysis, despite its simplicity, is somewhat limited in its capabilities. Conversely, the second algorithm is based on the syntactic analysis of texts in the Uzbek language and offers a more flexible approach to recognizing named entities.Both algorithms use a specific set of rules obtained as a result of a detailed study of the structure of the Uzbek language and its morphological and grammatical features.The purpose of this study is to enrich the relatively sparse literature on named entity recognition in the Uzbek language and to provide a solid basis for continued research in this area.