reference
Existing benchmarks for evaluating hallucinations in large language models include TruthfulQA (Lin et al., 2022), HallucinationEval (Wu et al., 2023), QAFactEval (Fabbri et al., 2022), and CohS (Kazemi et al., 2023).
Authors
Sources
- Survey and analysis of hallucinations in large language models www.frontiersin.org via serper
Referenced by nodes (1)
- TruthfulQA concept