HaluEval ↔ Large Language Models

Relations (1)

related 2.00 — strongly supporting 3 facts

HaluEval is a benchmark specifically designed to evaluate the hallucination performance of Large Language Models, as established by Li et al. (2023) in [1]. It serves as a dataset for hallucination detection in these models [2] and provides a collection of samples used to assess how well Large Language Models recognize hallucinations [3].

Facts (3)

Sources

Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org arXiv 1 fact

referenceLi et al. (2023) created 'HaluEval', a large-scale benchmark for evaluating hallucinations in Large Language Models.

The Hallucinations Leaderboard, an Open Effort to Measure ... huggingface.co Hugging Face 1 fact

referenceFaithDial, True-False, and HaluEval (covering QA, Dialogue, and Summarisation) are datasets specifically designed to target hallucination detection in Large Language Models.

EdinburghNLP/awesome-hallucination-detection - GitHub github.com GitHub 1 fact

claimHaluEval is a collection of generated and human-annotated hallucinated samples used for evaluating the performance of large language models in recognizing hallucinations.