measurement
The Perplexity hallucination detection method sees its AUROC score decrease by as much as 45.9% for the Mistral model on the NQ-Open dataset when switching from ROUGE to LLM-as-Judge evaluation.
Authors
Sources
- Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org via serper
Referenced by nodes (6)
- Perplexity concept
- NQ-Open concept
- Mistral AI entity
- hallucination detection concept
- ROUGE concept
- LLM-as-a-judge concept