Статья

Dictionary-Based Medical Text Analysis in Uzbek: Overcoming the Low-Resource Challenge

Davlatyor MenglievNovosibirsk State University,IT Department,Novosibirsk,RussiaVladimir BarakhninUrgench branch of Tashkent University of Information Technologies named after Muhammad al-Khwarizmi,IT Department,Urgench,UzbekistanMukhriddin EshkulovJizzakh polytechnic institute,Department of physics,Jizzakh,UzbekistanBozorboy PalvanovUrgench branch of Tashkent University of Information Technologies named after Muhammad al-Khwarizmi,Department of digital education technologies,Urgench,UzbekistanNilufar AbdurakhmonovaNational University of Uzbekistan,Department of Computational and applied linguistics,Tashkent,UzbekistanSaida KhamraevaUrgench branch of Tashkent University of Information Technologies named after Muhammad al-Khwarizmi,IT Department,Urgench,Uzbekistan

2023en

ABI

Аннотация

In the dynamically developing field of computational linguistics, problems associated with the processing of low-resource languages can face to certain difficulties. Moreover, solving such a problem becomes more complicated in the context of medical text processing, where the algorithm is required to do more subtle work than simply understand the context of the source text. The article proposes an algorithm for recognizing named entities (symptoms and medications) in medical texts in the Uzbek language, which is considered a low-resource language. The proposed algorithm begins its work by segmenting the text into sentences and word forms, after which each word from the source text is compared with a medical dictionary. Undetected words are subjected to morphological analysis and compared with a dictionary of word roots. The proposed approach not only speeds up the recognition of medical objects, but also minimizes redundancy and ensures data integrity. By integrating traditional linguistic methodologies with computational methods, this research offers a robust solution for efficient recognition of medical named entities in languages with limited available resources.

Темы

Translation Studies and Practices Natural Language Processing Techniques

Идентификаторы

DOI: 10.1109/csgb60362.2023.10329819

Цитирования и источники

Цитирований: 11 Использованных источников: 13

Показатели — AkademScholar · Скоро