measurement
The eRank hallucination detection method experiences a performance decline of 30.6% and 36.4% when evaluated using the LLM-as-Judge paradigm compared to ROUGE-based scores.
Authors
Sources
- Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org via serper
Referenced by nodes (3)
- hallucination detection concept
- ROUGE concept
- LLM-as-a-judge concept