measurement
Applying difficulty-based weighting to the benchmark reduces the mean standard deviation across models by 0.12%, decreasing it from 2.57% to 2.45%.
Authors
Sources
- A Knowledge Graph-Based Hallucination Benchmark for Evaluating ... arxiv.org via serper
Referenced by nodes (1)
- KGHaluBench concept