Асосий контентга ўтиш
AkademIndex

Маҳсулотлар

Ишлаб чиқувчилар учун

AkademBaseтез орадаЭкотизим учун очиқ API
Лотин
Ўзбек
Мақола

A Hybrid Named-Entity Recognition Algorithm for Ecological Documentation in Uzbekistan

Nilufar AbdurakhmonovaNational University of Uzbekistan,Named After Mirzo Ulugbek,Tashkent,UzbekistanDavlatyor MenglievCyber University,Nurafshon,UzbekistanSarvinoz MardonovaBukhara State University,Bukhara,UzbekistanTulkin AsadovBukhara State University,Bukhara,UzbekistanFarxod XalilovNational University of Uzbekistan,Named After Mirzo Ulugbek,Tashkent,UzbekistanFayzilat MaxsudovaNational University of Uzbekistan,Named After Mirzo Ulugbek,Tashkent,Uzbekistan
2025
ABI

Аннотация

The rapid expansion of environmental documentation in Uzbekistan, from parliamentary minutes to environmental impact assessment reports, requires automated tools to quickly find key actors, territories, and institutions. However, existing named entity extraction systems struggle with the dual script (Cyrillic ↔ Latin) and agglutinative morphology of the Uzbek language. This paper proposes a hybrid NER algorithm that combines two deterministic preprocessing modules with the SpaCy statistical model. The orthographic standardization module converts Cyrillic input to unified Latin, normalizing apostrophes and specific letters. The morphological corrector performs lemmatization and corrects typical typos, eliminating suffix "noise" that masks entity boundaries. A corpus of 5,000 environmental sentences, marked up according to the BIOES scheme for the PER, ORG, LOC classes, was compiled for training. Comparative testing shows that the hybrid approach increases the average F<inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf>-measure by 4–7% relative to the basic SpaCy model. The developed algorithm takes a step towards creating a reliable infrastructure for analyzing environmental data in the Uzbek language and supports decision-making in the field of sustainable development of the country.

Мавзулар

Идентификаторлар

Иқтибослар ва манбалар

Кўрсаткичлар — AkademScholar · Тез орада