measurement
The KGHaluBench entity-level filter at a 0.700 threshold achieved 5.65% higher alignment with human judgment and 48.78% higher recall compared to an automated judge using GPT-3.5-Turbo.

Authors

Sources

Referenced by nodes (2)