claim
Hallucination detection methods that perform well under ROUGE often show a substantial performance drop when re-evaluated using the 'LLM-as-Judge' paradigm.
Authors
Sources
- Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org via serper
Referenced by nodes (3)
- hallucination detection concept
- ROUGE concept
- LLM-as-a-judge concept