measurement
The KGHaluBench entity-level filter at a 0.700 threshold achieved 5.65% higher alignment with human judgment and 48.78% higher recall compared to an automated judge using GPT-3.5-Turbo.
Authors
Sources
- A Knowledge Graph-Based Hallucination Benchmark for Evaluating ... arxiv.org via serper
Referenced by nodes (2)
- KGHaluBench concept
- entity-level filtering concept