reference
HaluBench is a partially synthetic hallucination benchmarking dataset where negative examples (non-hallucinated answers) are derived from existing question answering benchmarks including HaluEval, DROP, CovidQA, FinanceBench, and PubMedQA.
Authors
Sources
- Detecting hallucinations with LLM-as-a-judge: Prompt ... - Datadog www.datadoghq.com via serper
Referenced by nodes (5)
- CovidQA concept
- DROP concept
- PubmedQA concept
- HaluEval concept
- FinanceBench concept