Comparative Analysis of Named Entity Recognition Models for Russian And Uzbek
Аннотация
This study compares named entity recognition systems for Russian and Uzbek. The Russian line of work rests on 6 established datasets and on the transformer models Slovnet BERT NER and DeepPavlov RuBERT-CRF, whose F1 reaches roughly 0.92, whereas Uzbek resources only appeared from 2023 onward and remain an order of magnitude smaller. We examine the UZNER and BERTbek corpora and the Mengliev datasets, F1 figures on the WikiANN and XTREME benchmarks and typological obstacles such as agglutination and dual script. Data quality outweighs sheer size for Uzbek and a single-number comparison of the 2 languages is misleading because their annotation schemes differ.