Restoring Punctuation in Uzbek Texts Using LLM's Fine Tuning Approaches
Annotatsiya
This study draws attention to the use of transformation-based ad-hoc tuning models for detecting and recovering punctuation marks in Uzbek language texts. The research was conducted to accurately predict punctuation marks including commas, full stops, exclamation marks and question marks using Bert and XML-Roberta architectures. The research is to develop a specialized database on key aspects of Uzbek punctuation, which helps the existence of focused research in this area. It has yielded good results in tests, confirming the potential of transformer models for punctuation recovery in low-resource languages, including Uzbek. The paper focuses on the development of tokenizers specialized exclusively for Uzbek. Methods for recovering punctuation marks in Uzbek language and improving model accuracy in case of class label mismatch were also discussed for future research. The study indicated that the mean F1 score in determining whether a word after 4 punctuation marks and punctuation marks is uppercase or lowercase in Uzbek language was 87.9%.