Article

A Hybrid NER–Sentiment Model for Uzbek Texts: Integrating Lexical, Deep Learning, and Entity-Based Approaches

Bobur SaidovFaculty of Mechanics and Mathematics, Novosibirsk State University, 1 Pirogova Str., Novosibirsk 630090, RussiaVladimir Borisovich BarakhninFaculty of Mechanics and Mathematics, Novosibirsk State University, 1 Pirogova Str., Novosibirsk 630090, RussiaRakhmon SaparbaevFaculty of Telecommunications Technologies, Urgench State University, 14 Kh. Alimdjan Str., Urgench 220100, UzbekistanZayniddin NarmuratovDepartment of Foreign Languages, Termez University of Economics and Service, 4-b Farovon Str., Termez 190108, UzbekistanRustamova ManzuraDepartment of Preschool and Primary Education Methodology, Kimyo International University in Tashkent, 156 Usmon Nosir Str., Tashkent 100121, UzbekistanRuzmetova ZilolakhonDepartment of Western Languages and Literature, Mamun University, 2 Bolkhovuz Str., Khiva 220900, UzbekistanAnorgul AtajanovaDepartment of Primary Education Methodology, Urgench State Pedagogical Institute, 1-A Gurlan Str., Urgench 220900, Uzbekistan

Big Data and Cognitive Computingjournal2026en

ABI

Abstract

This work proposes a hybrid Uzbek sentiment analysis model (sometimes referred to as tonality analysis in the local literature) that integrates contextual text representations with named-entity information from an NER module and emoji-based emotional cues that are common in short online messages. To provide a comprehensive baseline comparison, we evaluate seven approaches—SVM, LSTM, mBERT, XLM-RoBERTa-base, mDeBERTa-v3, LaBSE, and the proposed hybrid model—covering both classical machine learning and modern multilingual transformer architectures for low-resource sentiment tasks. The overall pipeline begins with Uzbek-specific text normalization to reduce noise from informal spellings, transliteration variants, and inconsistent apostrophe usage. In parallel, the system performs explicit emoji extraction to capture affective signals that are often expressed non-verbally in social media texts. Next, we construct three complementary feature streams: a context encoder for sentence-level semantics, NER-driven entity features that encode entity mentions and types, and an emotion module that models emoji priors and their interaction with contextual meaning. These streams are fused into a unified representation and fed to a final classifier to predict sentiment polarity. Experiments on an Uzbek test set demonstrate that the hybrid model reaches an F1-score of 0.92, consistently outperforming text-only baselines. The results indicate that entity-aware and emoji-informed features improve robustness under sarcasm/irony, mixed sentiment with multiple targets, and orthographic noise, making the approach suitable for social media analytics, public opinion monitoring, customer feedback triage, and recommendation-oriented text mining.

Topics

Sentiment Analysis and Opinion Mining Emotion and Mood Recognition Mental Health via Writing

Identifiers

DOI: 10.3390/bdcc10030092

Citations and references

Cited by 10 references

Metrics — AkademScholar · Coming soon