Асосий контентга ўтиш
AkademIndex

Маҳсулотлар

Ишлаб чиқувчилар учун

AkademBaseЭкотизим учун очиқ API
Мақола

Building a Sentence-Level Sentiment Classifier for Karakalpak: From Low-Resource Data to Emotion Trajectories

Zamira YusupovaTashkent University of Information,Technologies Named After Muhammad al-Khwarizmi,Tashkent,UzbekistanShukhrat KayumovCyber University,Nurafshon,UzbekistanBahodir IbragimovUrgench State University,Urgench,Uzbekistan
2025
ABI

Аннотация

This paper presents a compact and fully interpretable algorithm for analyzing the sentiment and emotional dynamics of Karakalpak educational texts. In the first stage, a sentence-based sentiment classifier for Karakalpak sentences was implemented, based on simple and reliable features (word/char n-grams) and logistic regression. The algorithm outputs a continuous polarity for each sentence, ranging from -1 to +1. In the second stage, the resulting numerical polarity values are analyzed across sentences, extracting three understandable indices of "emotional trajectory": volatility (frequency and strength of switches), stability/self-regulation (proportion of "calm" sections and low amplitude), and balance/trend (midtone and its change toward the end). The pipeline includes script standardization (Latin/apostrophes), lightweight lemmatization for productive suffixes, and accurate sentence segmentation, which is critical for an agglutinative language. The methodological contribution consists of combining an explanatory first-stage model with a deterministic, rule-based analysis of dynamics in the second stage. This design does not require large training corpora and is easily replicable: all solutions are accompanied by understandable "carriers," and essay-level aggregates are transformed into short, practical recommendations for the writing teacher. We describe the tagging and evaluation protocols, discuss limitations (genre sensitivity, code-switching, polysemy), and outline future directions: expanding the corpus, refining the rules for negation and modifiers, soft transfer from related Turkic languages, and integrating lightweight neural modules as the data grows.

Ҳали таржима қилинмаган

Мавзулар

Идентификаторлар

Иқтибослар ва манбалар