Мақола

Rule-Based Punctuation Algorithm for the Uzbek Language

Maksud SharipovUrgench State University,Department of Computer Sciences,Urgench,UzbekistanHushnudbek S. AdinaevUrgench branch of Tashkent University of Information Technologies named after Muhammad al-Khwarizmi,Department of IT,Urgench,UzbekistanElmurod KuriyozovUrgench State University,Department of Computer Sciences,Urgench,Uzbekistan

2024en

ABI

Аннотация

Punctuation analysis occupies an important place in natural language processing, presenting the need for a model that can predict correct punctuations in number of tasks like text pre-processing, spell checking, grammar checking, information retrieval and so on. The task of predicting right punctuation marks is context-dependent making languageindependent general punctuation generation tools non-trivial for the job. Although the idea of creating such tool has already been accomplished for many languages, the Uzbek language is one of the few low-resource languages, and to our knowledge, punctuation analysis and prediction algorithms for Uzbek texts have not yet been developed. In this paper, it is proposed a rulebased algorithm and a model for punctuation analysis of periods and commas in Uzbek language texts. While the major contribution of this paper is a rule-based algorithm for determining the correct or incorrect placement of periods and commas in Uzbek language text, the authors also present the analysis results on a corpus with various fields, acknowledging the need for further analysis of the task, including machine learning and deep learning solutions for the future work. The proposed rule-based algorithm for punctuation analysis will not only help Uzbek texts, but also will hopefully play as a pivot point for other closely-related Turkic languages as well.

Мавзулар

Natural Language Processing Techniques Translation Studies and Practices Lexicography and Language Studies

Идентификаторлар

DOI: 10.1109/edm61683.2024.10615061

Иқтибослар ва манбалар

8 та иқтибос 10 та фойдаланилган манба

Кўрсаткичлар — AkademScholar · Тез орада