claim
The authors of 'Re-evaluating Hallucination Detection in LLMs' found that while ROUGE exhibits high precision, it fails to detect many hallucinations, whereas the LLM-as-Judge method achieves significantly higher recall and aligns more closely with human assessments.
Authors
Sources
- Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org via serper
Referenced by nodes (2)
- ROUGE concept
- LLM-as-a-judge concept