SentiUzNet: A Domain-Independent Sentiment Lexicon for Uzbek and Its Application in Machine Learning Models
Annotatsiya
This paper presents the structure and principal components of the linguistic resources required for sentiment analysis in the Uzbek language. The research aims to identify and develop effective approaches for constructing a linguistic database - referred to as SentiUzNet - and to establish a foundational sentiment lexicon tailored specifically to the characteristics of the Uzbek language. In particular, the paper discusses key principles for annotating words with sentiment polarity and subjectivity scores, as well as methodological foundations for building a lexicographic database to support automated emotional analysis of texts. A significant part of the research focuses on experimenting with large-scale user-generated content, specifically social media comments written in Uzbek. These datasets were used to train and evaluate sentiment analysis models, thereby allowing an assessment of their performance and practical applicability. The results of this research represent one of the first comprehensive attempts to facilitate automatic sentiment detection in the Uzbek language and are expected to contribute substantially to the advancement of natural language processing technologies in under-resourced linguistic settings.