Punctuation Restoration in Uzbek Texts Using POS Tagging Techniques
Abstract
This paper addresses the problem of automatic punctuation restoration in Uzbek-language texts, focusing on the application of Part-of-Speech (POS) tagging as a core method to simplify the process. Due to the agglutinative nature of the Uzbek language, texts exhibit complex morphological structures in which the identification of word forms and their grammatical functions is a critical factor in accurately placing punctuation marks. The study proposes a rule-based model that determines the syntactic role of each word using POS tagging while considering the linguistic features specific to Uzbek. The approach is experimentally validated using transformer-based models particularly POS taggers adapted to the BERT architecture on real Uzbek text corpora. The results demonstrate that word function identification through POS tagging significantly improves both the accuracy and quality of punctuation placement. This research presents an important technological solution for advancing Natural Language Processing (NLP) tools for the Uzbek language, especially in applications such as automatic text editing, sentence boundary detection, machine translation, and speech-to-text systems.