Skip to main content
Article

Building a Sentence-Level Sentiment Classifier for Karakalpak: From Low-Resource Data to Emotion Trajectories

Zamira YusupovaTashkent University of Information,Technologies Named After Muhammad al-Khwarizmi,Tashkent,UzbekistanShukhrat KayumovCyber University,Nurafshon,UzbekistanBahodir IbragimovUrgench State University,Urgench,Uzbekistan
2025
ABI

Abstract

This paper presents a compact and fully interpretable algorithm for analyzing the sentiment and emotional dynamics of Karakalpak educational texts. In the first stage, a sentence-based sentiment classifier for Karakalpak sentences was implemented, based on simple and reliable features (word/char n-grams) and logistic regression. The algorithm outputs a continuous polarity for each sentence, ranging from -1 to +1. In the second stage, the resulting numerical polarity values are analyzed across sentences, extracting three understandable indices of "emotional trajectory": volatility (frequency and strength of switches), stability/self-regulation (proportion of "calm" sections and low amplitude), and balance/trend (midtone and its change toward the end). The pipeline includes script standardization (Latin/apostrophes), lightweight lemmatization for productive suffixes, and accurate sentence segmentation, which is critical for an agglutinative language. The methodological contribution consists of combining an explanatory first-stage model with a deterministic, rule-based analysis of dynamics in the second stage. This design does not require large training corpora and is easily replicable: all solutions are accompanied by understandable "carriers," and essay-level aggregates are transformed into short, practical recommendations for the writing teacher. We describe the tagging and evaluation protocols, discuss limitations (genre sensitivity, code-switching, polysemy), and outline future directions: expanding the corpus, refining the rules for negation and modifiers, soft transfer from related Turkic languages, and integrating lightweight neural modules as the data grows.

Topics

Identifiers

Citations and references

Cited by 020 references
Metrics — AkademScholar · Coming soon