hallucination detection ↔ Eigenscore

Relations (1)

related 2.58 — strongly supporting 5 facts

Eigenscore is explicitly identified as a consistency-based method for hallucination detection in [1], and its performance and limitations in this specific task are analyzed in [2], [3], [4], and [5].

Facts (5)

Sources

Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org arXiv 5 facts

claimThe Mean-Len metric matches or outperforms sophisticated hallucination detection approaches like Eigenscore and LN-Entropy across multiple datasets.

referenceConsistency-based methods for hallucination detection in large language models include EigenScore (Chen et al., 2024), which computes generation consistency via eigenvalue spectra, and LogDet (Sriramanan et al., 2024a), which measures covariance structure from single generations.

measurementThe Eigenscore hallucination detection method experiences a performance erosion of 19.0% for the Llama model and 30.4% for the Mistral model on the NQ-Open dataset when switching from ROUGE to LLM-as-Judge evaluation.

claimThe hallucination detection methods Eigenscore and eRank exhibit high correlations with response length, suggesting these methods may primarily detect length variations rather than semantic features.

measurementExisting hallucination detection methods experience performance drops of up to 45.9% for Perplexity and 30.4% for Eigenscore when evaluated using LLM-as-Judge criteria compared to ROUGE.