procedure
The automated judge prompts used in the human validation study for comparing KGHaluBench’s entity-level and fact-level filters against GPT-3.5-Turbo are configured with a Temperature of 0 and Max Tokens of 10.
Authors
Sources
- A Knowledge Graph-Based Hallucination Benchmark for Evaluating ... arxiv.org via serper
Referenced by nodes (1)
- KGHaluBench concept