reference
The MultiHal benchmark is a factual language modeling benchmark that extends previous benchmarks such as Shroom2024, HaluEval, HaluBench, TruthfulQA, Felm, Defan, and SimpleQA by mining relevant knowledge graph paths from Wikidata.
Authors
Sources
- EdinburghNLP/awesome-hallucination-detection - GitHub github.com via serper
Referenced by nodes (3)
- TruthfulQA concept
- Wikidata entity
- HaluEval concept