reference
A white-box hallucination detector approach treats the Large Language Model as a dynamic graph and analyzes structural properties of internal attention mechanisms. This method extracts spectral features, specifically eigenvalues, from attention maps to predict fabrication: factual retrieval produces stable eigen-structures, while hallucination leads to diffuse, chaotic patterns. This detector operates independently of generated semantic content and was evaluated across seven QA benchmarks (NQ-Open, TriviaQA, CoQA, SQuADv2, HaluEval-QA, TruthfulQA, GSM8K) using AUROC, Precision, Recall, and Cohen's Kappa metrics.

Authors

Sources

Referenced by nodes (9)