Deep Learning for Low-Resource Language: Sentiment Analysis of Karakalpak Texts in Energy Sector by Fine-Tuning mBERT
Аннотация
This article considers the problem of sentiment analysis of the Karakalpak language in the energy sector. Although many existing solutions are focused on popular world languages, their adaptation to low-resource languages (Karakalpak, Uzbek, etc.) remains a difficult, and even impossible task, due to the different nature of the languages. At the same time, there are no solutions focused on the Karakalpak language, especially in the context of user reviews in the energy sector. The authors propose a solution based on a neural network model, for training which an annotated corpus of texts in the Karakalpak language, consisting of complaints, suggestions and general comments, covering the period from 2019 to 2024, was used. Experimental assessments confirm the reliability of the work in terms of accuracy, recall and F1 score, where the values were 93%, 94% and 93.5%, respectively. The results can be used not only in the field of sentiment analysis of consumer feedback in the electric power industry, but also in other related areas of the economy.