Статья

Extracting Ecological Facts from Karakalpak Texts via Named Entity Recognition

Umidakhon AzimovaTashkent State University of Economics,Tashkent,UzbekistanG KasimovaTashkent State University of Economics,Tashkent,UzbekistanZarina KazbekovaTashkent State University of Economics,Tashkent,UzbekistanBoburjon I. ShermatovUrgench State University,Urgench,UzbekistanElmurod UrinovCyber University,Nurafshon,UzbekistanNilufar Istamovna AdizovaBukhara State University,Bukhara,Uzbekistan

2025

ABI

Аннотация

Environmental monitoring, conservation journalism, and regulatory enforcement in Karakalpakstan depend on timely access to structured data. Currently, this data is hidden in unstructured prose—field reports, local news, NGO newsletters, impact assessments, and community observations. The authors address this gap by developing a system for recognizing environmental named entities for Karakalpak texts with a compact, user-friendly schema focused on three high-value categories: species, locations, and conservation organizations. The task is complicated by digraphia (Cyrillic/Latin), code switching between Uzbek and Russian, multilingual species names (Latin binomials and local names), rich hydronymy and microtoponyms, as well as organizational pseudonyms and abbreviations that change over time. The resulting system enables dynamic tracking of species mentions, geocoding of location data aggregation, and reliable identification of responsible agencies, facilitating faster incident triage and deeper integration with geospatial layers and field measurements. By focusing the framework on valid entities and documenting replicable guidelines and assessment design, this work creates a reusable framework for ecological text analysis in resource-constrained settings and offers a template for extending named entity identification to neighboring languages and ecological subdomains.

Темы

Biomedical Text Mining and Ontologies Semantic Web and Ontologies Animal and Plant Science Education

Идентификаторы

DOI: 10.1109/apeie66761.2025.11289246

Цитирования и источники

Цитирований: 0Использованных источников: 27

Показатели — AkademScholar · Скоро