Multi-Task Learning for Uzbek News Text Classification
Annotatsiya
We explore multi-task learning for Uzbek news text classification in a low-resource setting. We introduce a new dataset of more than sixteen thousand Uzbek news articles collected from Kun.uz, annotated for three supervised tasks: topic classification, author gender identification, and publication year prediction. Using a shared-encoder, multi-head architecture based on a pretrained Uzbek transformer model, we compare singletask baselines with several multi-task learning variants, including layer-wise aggregation and cross-task attention. Experimental results show that transformer-based models outperform classical baselines across all tasks. Multi-task learning produces generally stable performance, with modest gains for topic classification and consistent results for gender and temporal prediction. Our analysis using confusion matrices and shared-representation visualizations shows that, in Uzbek news classification under lowresource conditions, task difficulty is mainly determined by how well the labels align with the underlying semantic representations, rather than by architectural complexity. Our study provides the first systematic evaluation of multi-task learning for Uzbek news classification and contributes an annotated dataset and empirical insights for future low-resource natural language processing research.