hallucination detection ↔ Perplexity

Relations (1)

related 2.00 — strongly supporting 3 facts

Perplexity is identified as a specific uncertainty-based method used for hallucination detection in large language models [1], and its performance metrics in this capacity are explicitly measured and compared against other evaluation criteria {fact:1, fact:2}.

Facts (3)

Sources

Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org arXiv 3 facts

referenceUncertainty-based methods for hallucination detection in large language models include Perplexity (Ren et al., 2023), Length-Normalized Entropy (LN-Entropy) (Malinin and Gales, 2021), and Semantic Entropy (SemEntropy) (Farquhar et al., 2024), which utilize multiple generations to capture sequence-level uncertainty.

measurementExisting hallucination detection methods experience performance drops of up to 45.9% for Perplexity and 30.4% for Eigenscore when evaluated using LLM-as-Judge criteria compared to ROUGE.

measurementThe Perplexity hallucination detection method sees its AUROC score decrease by as much as 45.9% for the Mistral model on the NQ-Open dataset when switching from ROUGE to LLM-as-Judge evaluation.