Relations (1)
related 2.00 — strongly supporting 3 facts
HaluEval is a benchmark specifically designed to evaluate the hallucination performance of Large Language Models, as established by Li et al. (2023) in [1]. It serves as a dataset for hallucination detection in these models [2] and provides a collection of samples used to assess how well Large Language Models recognize hallucinations [3].
Facts (3)
Sources
Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org 1 fact
referenceLi et al. (2023) created 'HaluEval', a large-scale benchmark for evaluating hallucinations in Large Language Models.
The Hallucinations Leaderboard, an Open Effort to Measure ... huggingface.co 1 fact
referenceFaithDial, True-False, and HaluEval (covering QA, Dialogue, and Summarisation) are datasets specifically designed to target hallucination detection in Large Language Models.
EdinburghNLP/awesome-hallucination-detection - GitHub github.com 1 fact
claimHaluEval is a collection of generated and human-annotated hallucinated samples used for evaluating the performance of large language models in recognizing hallucinations.