reference
Existing benchmarks for evaluating hallucinations in large language models include TruthfulQA (Lin et al., 2022), HallucinationEval (Wu et al., 2023), QAFactEval (Fabbri et al., 2022), and CohS (Kazemi et al., 2023).

Authors

Sources

Referenced by nodes (1)