claim
An evaluation method based on 'LLM-as-Judge' demonstrates closer agreement with human assessments of factual correctness compared to ROUGE, according to Thakur et al. (2025).
Authors
Sources
- Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org via serper
Referenced by nodes (3)
- factual correctness concept
- ROUGE concept
- LLM-as-a-judge concept