Article

Human-Reviewed Uzbek Legal Named Entity Recognition Dataset

Bobur R. SaidovUrgench Khorezm Province Urgench State University named after Abu Rayhan BiruniZarnigor FayzullaevaTashkent Tashkent Province Tashkent University of Information Technologies named after Muhammad al-KhwarizmiUmida BazarovaNavoi Navoi State UniversityGulnoza NarkabilovaFergana Fergana State UniversityНасиба АзизоваQarshi Kashkadarya Province Karshi State UniversityFeruzakhon RustamovaAndijan State Medical InstituteFiruza HalimovaSamarkand Samarkand State Institute of Foreign Languages

F1000Researchjournal2026en

ABI

Abstract

<ns3:p>This article describes a human-reviewed Uzbek legal-domain named entity recognition (NER) dataset developed as a reusable resource for low-resource legal NLP. The release contains 12 entity categories: PER, ORG, LOC, DATE, MONEY, POSITION, DOCNO, LAW, COURT, BANK, TIN, and CADASTRE. The dataset is provided in XLSX, CSV, JSON, and JSONL formats and is structured into two complementary layers: a core subset of manually reviewable source-grounded records and an extended augmented subset used to support lower-frequency labels in training-oriented settings. The package also includes supporting documentation, split guidance, a data dictionary, and review-related metadata, including provenance, verification status, and quality flags. Character-level start and end offsets are included where recoverable. The release is intended to facilitate Uzbek legal NER research, resource curation, and transparent reuse under provenance-aware conditions.</ns3:p>

Topics

Topic Modeling Artificial Intelligence in Law Natural Language Processing Techniques

Identifiers

DOI: 10.12688/f1000research.180408.1

Citations and references

Cited by 014 references

Metrics — AkademScholar · Coming soon