Dictionary-Based Medical Text Analysis in Uzbek: Overcoming the Low-Resource Challenge
Аннотация
In the dynamically developing field of computational linguistics, problems associated with the processing of low-resource languages can face to certain difficulties. Moreover, solving such a problem becomes more complicated in the context of medical text processing, where the algorithm is required to do more subtle work than simply understand the context of the source text. The article proposes an algorithm for recognizing named entities (symptoms and medications) in medical texts in the Uzbek language, which is considered a low-resource language. The proposed algorithm begins its work by segmenting the text into sentences and word forms, after which each word from the source text is compared with a medical dictionary. Undetected words are subjected to morphological analysis and compared with a dictionary of word roots. The proposed approach not only speeds up the recognition of medical objects, but also minimizes redundancy and ensures data integrity. By integrating traditional linguistic methodologies with computational methods, this research offers a robust solution for efficient recognition of medical named entities in languages with limited available resources.