measurement
The KGHaluBench tri-stage fact verification pipeline achieved 87.74% alignment with human judgment in the validation study, which was 8.56% higher than the automated judge using GPT-3.5-Turbo, which achieved 79.18%.
Authors
Sources
- A Knowledge Graph-Based Hallucination Benchmark for Evaluating ... arxiv.org via serper
Referenced by nodes (1)
- KGHaluBench concept