concept

Eigenscore

Facts (10)

Sources

Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org arXiv Aug 13, 2025 5 facts

claimThe Mean-Len metric matches or outperforms sophisticated hallucination detection approaches like Eigenscore and LN-Entropy across multiple datasets.

referenceConsistency-based methods for hallucination detection in large language models include EigenScore (Chen et al., 2024), which computes generation consistency via eigenvalue spectra, and LogDet (Sriramanan et al., 2024a), which measures covariance structure from single generations.

measurementThe Eigenscore hallucination detection method experiences a performance erosion of 19.0% for the Llama model and 30.4% for the Mistral model on the NQ-Open dataset when switching from ROUGE to LLM-as-Judge evaluation.

claimThe hallucination detection methods Eigenscore and eRank exhibit high correlations with response length, suggesting these methods may primarily detect length variations rather than semantic features.

measurementExisting hallucination detection methods experience performance drops of up to 45.9% for Perplexity and 30.4% for Eigenscore when evaluated using LLM-as-Judge criteria compared to ROUGE.

EdinburghNLP/awesome-hallucination-detection - GitHub github.com GitHub 5 facts

claimThe INSIDE framework and EigenScore metric were evaluated on LLaMA and OPT models across question answering benchmarks, improving detection compared with uncertainty- and lexical-similarity baselines.

claimEigenScore is a metric that uses eigenvalues of the covariance matrix of multiple response embeddings to measure semantic consistency in large language models.

claimMetrics used for hallucination detection include SelfCheckGPT, FactScore, EigenScore, Efficient EigenScore (EES), Semantic Entropy, Perplexity, HaluEval Accuracy, and ROUGE-1 (XSum).

measurementEstablished hallucination detection methods including Perplexity, EigenScore, and eRank suffer performance drops of up to 45.9% AUROC when evaluated with human-aligned LLM-as-Judge metrics instead of ROUGE.

claimEfficient EigenScore (EES) is an unsupervised metric that approximates EigenScore at twice the speed.