Перейти к основному содержанию
AkademIndex

Продукты

Для разработчиков

AkademBaseскороОткрытый API экосистемы
Латиница
Русский
Статья

Punctuation Restoration in Uzbek Texts Using POS Tagging Techniques

Elov Botir BoltayevichTashkent State University of Uzbek Language and Literature,Dept. of Computational Linguistics and Digital Technologies,Tashkent,UzbekistanAlavutdinova Nadira GaniyevnaUniversity of Uzbekistan,Dept. of Uzbek Linguistics National,Tashkent,UzbekistanSobirova Zamigor Ganijon KiziTashkent State University of Uzbek Language and Literature,Computational Linguistics and Digital Technologies,Tashkent,Uzbekistan
2025
ABI

Аннотация

This paper addresses the problem of automatic punctuation restoration in Uzbek-language texts, focusing on the application of Part-of-Speech (POS) tagging as a core method to simplify the process. Due to the agglutinative nature of the Uzbek language, texts exhibit complex morphological structures in which the identification of word forms and their grammatical functions is a critical factor in accurately placing punctuation marks. The study proposes a rule-based model that determines the syntactic role of each word using POS tagging while considering the linguistic features specific to Uzbek. The approach is experimentally validated using transformer-based models particularly POS taggers adapted to the BERT architecture on real Uzbek text corpora. The results demonstrate that word function identification through POS tagging significantly improves both the accuracy and quality of punctuation placement. This research presents an important technological solution for advancing Natural Language Processing (NLP) tools for the Uzbek language, especially in applications such as automatic text editing, sentence boundary detection, machine translation, and speech-to-text systems.

Темы

Идентификаторы

Цитирования и источники

Показатели — AkademScholar · Скоро