Fact — measurement — Knowledge Tree

The Perplexity hallucination detection method sees its AUROC score decrease by as much as 45.9% for the Mistral model on the NQ-Open dataset when switching from ROUGE to LLM-as-Judge evaluation.

Authors

Person: Not available Organization: arXiv
Re-evaluating Hallucination Detection in LLMs - arXiv

Sources

Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org arXiv via serper

Referenced by nodes (6)

Perplexity concept
NQ-Open concept
Mistral AI entity
hallucination detection concept
ROUGE concept
LLM-as-a-judge concept