Human-Reviewed Uzbek Legal Named Entity Recognition Dataset
Abstract
<ns3:p>This article describes a human-reviewed Uzbek legal-domain named entity recognition (NER) dataset developed as a reusable resource for low-resource legal NLP. The release contains 12 entity categories: PER, ORG, LOC, DATE, MONEY, POSITION, DOCNO, LAW, COURT, BANK, TIN, and CADASTRE. The dataset is provided in XLSX, CSV, JSON, and JSONL formats and is structured into two complementary layers: a core subset of manually reviewable source-grounded records and an extended augmented subset used to support lower-frequency labels in training-oriented settings. The package also includes supporting documentation, split guidance, a data dictionary, and review-related metadata, including provenance, verification status, and quality flags. Character-level start and end offsets are included where recoverable. The release is intended to facilitate Uzbek legal NER research, resource curation, and transparent reuse under provenance-aware conditions.</ns3:p>