Статья

Automated Recognition of Named Entities and Dialect Standardization in Uzbek Legal Texts

Davlatyor MenglievNovosibirsk State University,Novosibirsk,RussiaNilufar AbdurakhmonovaNational University of Uzbekistan named after Mirzo Ulugbek,Department of Computer linguistics,Tashkent,UzbekistanHasanboy RahimovAndijan State University,Andijan,UzbekistanNikolai Yu. ZolotykhLobachevsky State University,Nizhni Novgorod,RussiaAlisher A. UbaydullayevNational University of Uzbekistan named after Mirzo Ulugbek,Tashkent,UzbekistanBahodir IbragimovUrgench State University,Department of Information Technologies,Urgench,Uzbekistan

2024en

ABI

Аннотация

This study presents the development of a tool for identifying named entities in Uzbek legal texts. It should be noted, that besides of detecting named entities, the authors developed an algorithm, which is able to standardize word forms by replacing the detected dialect words (Karluk, Kypchak and Oghuz) with their formal forms. This will help to fix popular grammatical mistakes among native speakers from different regions of the Uzbekistan. The proposed hybrid approach combines the traditional approach, which is used in the preprocessing (standardization of word forms), where a dictionary with more than 10 thousand marked words is actively used. At the same time, a custom language model is used to work with detecting named entities, which was trained on 2000 legal sentences. The testing results showed quite high indicators, in particular, the language model detected named entities with an accuracy of 90%, and the recall reached 94%. Moreover, the algorithm used to standardize dialect word forms showed even higher rates, ranging from 90% to 100<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">%</sup> depending on the dialect.

Темы

Natural Language Processing Techniques Translation Studies and Practices

Идентификаторы

DOI: 10.1109/piere62470.2024.10804942

Цитирования и источники

Цитирований: 14 Использованных источников: 14

Показатели — AkademScholar · Скоро