Мақола

A Cascade of Evaluation Biases in LLM-Based Knowledge Graph Verification

Anatoliy KremenchutskiyBukhara State Universit

Zenodo (CERN European Organization for Nuclear Research)repository2026en

ABI

Аннотация

Large Language Models (LLMs) are increasingly deployed as automated evaluators for knowledge graph (KG) verification, yet the biases they introduce into this process remain poorly characterized. We present a systematic investigation of four interconnected evaluation biases, verbosity bias, acquiescence bias, negation asymmetry, and position bias, that form a compounding cascade in LLM-based KG verification. Using four locally-deployed 7–9B parameter models (Qwen 2.5:7b, Gemma2:9b, Llama 3.1:8b, and Mistral:7b) evaluated on 42–100 knowledge graph triples across multiple datasets, we demonstrate that: (1) verbose model responses inflate verification accuracy by up to 47 percentage points (logistic regression OR = 1.90 per 10 additional words, p < 0.001); (2) acquiescence toward known-false triples ranges from 8.9% to 33.3% across models, with sharp domain-dependent variation (0–70% within a single model); (3) negation comprehension drops 9.9–31.8 percentage points on false versus true triples; and (4) multiple-choice position bias reaches statistical significance (χ²(3) = 14.33, p < 0.01) with primacy effects up to 100% for position A. These biases interact sequentially and may compound: verbosity inflates string-match scores, which mask acquiescence, which in turn compounds with negation failures to produce systematically over-optimistic verification. We propose the cascade model as a diagnostic framework and discuss mitigation strategies for each bias layer.

Ҳали таржима қилинмаган

Мавзулар

Advanced Graph Neural Networks Topic Modeling Machine Learning in Healthcare

Идентификаторлар

DOI: 10.5281/zenodo.19379999

Иқтибослар ва манбалар

0 та иқтибос0 та фойдаланилган манба

Кўрсаткичлар — AkademScholar