measurement
Existing hallucination detection methods experience performance drops of up to 45.9% for Perplexity and 30.4% for Eigenscore when evaluated using LLM-as-Judge criteria compared to ROUGE.
Authors
Sources
- Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org via serper
Referenced by nodes (5)
- Perplexity concept
- Eigenscore concept
- hallucination detection concept
- ROUGE concept
- LLM-as-a-judge concept