Article

Punctuation Restoration in Uzbek Texts Using POS Tagging Techniques

Elov Botir BoltayevichTashkent State University of Uzbek Language and Literature,Dept. of Computational Linguistics and Digital Technologies,Tashkent,UzbekistanAlavutdinova Nadira GaniyevnaUniversity of Uzbekistan,Dept. of Uzbek Linguistics National,Tashkent,UzbekistanSobirova Zamigor Ganijon KiziTashkent State University of Uzbek Language and Literature,Computational Linguistics and Digital Technologies,Tashkent,Uzbekistan

2025

ABI

Abstract

This paper addresses the problem of automatic punctuation restoration in Uzbek-language texts, focusing on the application of Part-of-Speech (POS) tagging as a core method to simplify the process. Due to the agglutinative nature of the Uzbek language, texts exhibit complex morphological structures in which the identification of word forms and their grammatical functions is a critical factor in accurately placing punctuation marks. The study proposes a rule-based model that determines the syntactic role of each word using POS tagging while considering the linguistic features specific to Uzbek. The approach is experimentally validated using transformer-based models particularly POS taggers adapted to the BERT architecture on real Uzbek text corpora. The results demonstrate that word function identification through POS tagging significantly improves both the accuracy and quality of punctuation placement. This research presents an important technological solution for advancing Natural Language Processing (NLP) tools for the Uzbek language, especially in applications such as automatic text editing, sentence boundary detection, machine translation, and speech-to-text systems.

Topics

Natural Language Processing Techniques Educational Technology and Pedagogy Economic and Industrial Development

Identifiers

DOI: 10.1109/ubmk67458.2025.11206900

Citations and references

Cited by 01 references

Metrics — AkademScholar