Asosiy kontentga oʻtish
AkademIndex

Mahsulotlar

Ishlab chiquvchilar uchun

AkademBasetez oradaEkotizim uchun ochiq API
Lotin
Maqola

A Morphology-Aware Neural Network Model for Uzbek Text Classification

Ulugbek SalaevUrgench State University Named After Abu Rayhan Biruni,Department of Computer Science,Urgench,UzbekistanGayrat MatlatipovUrgench State University Named After Abu Rayhan Biruni,Department of Computer Science,Urgench,Uzbekistan
2025
ABI

Annotatsiya

Text classification is a core task in Natural Language Processing (NLP), significantly enhanced by the integration of neural network models and linguistic preprocessing techniques. This study presents a neural network-based text classification model for the Uzbek language, incorporating morphological feature extraction to improve accuracy and generalization. Given the agglutinative nature of Uzbek, in which words exhibit extensive inflectional and derivational variation, standard word embedding models often encounter difficulties with out-of-vocabulary items and sparse vector representations. To address these issues, we apply morphological stemming to extract the root forms of words, thereby reducing vocabulary sparsity and improving feature representations. We evaluate multiple embedding approaches, including Word2Vec, FastText, and Bag-of-Words, on both the original and morphologically stemmed versions of an Uzbek news dataset. Additionally, we fine-tune the TahrirchiBERT model, a BERT-based neural network pretrained specifically for the Uzbek language, for comparative analysis. Experimental results demonstrate that morphology-aware embeddings consistently enhance classification performance, particularly for rare or highly inflected words. The findings underscore the importance of integrating morphological features in language modeling and provide a robust foundation for developing scalable Uzbek NLP applications in text categorization and related downstream tasks.

Mavzular

Identifikatorlar

Iqtiboslar va manbalar

Koʻrsatkichlar — AkademScholar · Tez orada