Article

A Hybrid Named-Entity Recognition Algorithm for Ecological Documentation in Uzbekistan

Nilufar AbdurakhmonovaNational University of Uzbekistan,Named After Mirzo Ulugbek,Tashkent,UzbekistanDavlatyor MenglievCyber University,Nurafshon,UzbekistanSarvinoz MardonovaBukhara State University,Bukhara,UzbekistanTulkin AsadovBukhara State University,Bukhara,UzbekistanFarxod XalilovNational University of Uzbekistan,Named After Mirzo Ulugbek,Tashkent,UzbekistanFayzilat MaxsudovaNational University of Uzbekistan,Named After Mirzo Ulugbek,Tashkent,Uzbekistan

2025

ABI

Abstract

The rapid expansion of environmental documentation in Uzbekistan, from parliamentary minutes to environmental impact assessment reports, requires automated tools to quickly find key actors, territories, and institutions. However, existing named entity extraction systems struggle with the dual script (Cyrillic ↔ Latin) and agglutinative morphology of the Uzbek language. This paper proposes a hybrid NER algorithm that combines two deterministic preprocessing modules with the SpaCy statistical model. The orthographic standardization module converts Cyrillic input to unified Latin, normalizing apostrophes and specific letters. The morphological corrector performs lemmatization and corrects typical typos, eliminating suffix "noise" that masks entity boundaries. A corpus of 5,000 environmental sentences, marked up according to the BIOES scheme for the PER, ORG, LOC classes, was compiled for training. Comparative testing shows that the hybrid approach increases the average F<inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf>-measure by 4–7% relative to the basic SpaCy model. The developed algorithm takes a step towards creating a reliable infrastructure for analyzing environmental data in the Uzbek language and supports decision-making in the field of sustainable development of the country.

Topics

Web Data Mining and Analysis Text and Document Classification Technologies Topic Modeling

Identifiers

DOI: 10.1109/ubmk67458.2025.11206975

Citations and references

Cited by 010 references

Metrics — AkademScholar · Coming soon