Перейти к основному содержанию
AkademIndex

Продукты

Для разработчиков

AkademBaseОткрытый API экосистемы
Другое

UzLegalNER v3_fixed: Uzbek Legal Contracts Named Entity Recognition Dataset (PER/ORG/LOC/POSITION/DATE/MONEY/DOCNO)

Bobur SaidovNovosibirsk State University
Open MINDrepository2026uz
ABI

Аннотация

UzLegalNER v3_fixed is a named entity recognition (NER) dataset for Uzbek legal contracts and related official documents. The dataset uses a seven-label schema: PER, ORG, LOC, POSITION, DATE, MONEY, DOCNO. We release: (i) a master spreadsheet (XLSX) with sentence-level metadata and character-level entity spans, (ii) a JSONL version with span annotations, and (iii) CoNLL BIO splits (train/dev/test) for standard NER training and benchmarking. Key fields: sent_id (unique per sentence), doc_id (document/group identifier for doc-level splitting), doc_type, script (latin), split, text, and entities (start/end/label/text). Overlapping/nested spans are removed for CoNLL compatibility (the longest span is retained). Intended use: training and evaluating Transformer-based NER models and gazetteer-enhanced methods, with a particular focus on robustness to unseen entity surface forms in legal text.

Перевод пока недоступен

Идентификаторы

Цитирования и источники

Цитирований: 0Использованных источников: 0