reference
A white-box hallucination detector approach treats the Large Language Model as a dynamic graph and analyzes structural properties of internal attention mechanisms. This method extracts spectral features, specifically eigenvalues, from attention maps to predict fabrication: factual retrieval produces stable eigen-structures, while hallucination leads to diffuse, chaotic patterns. This detector operates independently of generated semantic content and was evaluated across seven QA benchmarks (NQ-Open, TriviaQA, CoQA, SQuADv2, HaluEval-QA, TruthfulQA, GSM8K) using AUROC, Precision, Recall, and Cohen's Kappa metrics.
Authors
Sources
- EdinburghNLP/awesome-hallucination-detection - GitHub github.com via serper
Referenced by nodes (9)
- TruthfulQA concept
- Precision concept
- recall concept
- TriviaQA concept
- AUROC concept
- SQuAD concept
- HaluEval concept
- NQ-Open concept
- attention mechanism concept