claim
The 'LLM-as-Judge' evaluation method offers a closer alignment with human judgments of factual correctness compared to ROUGE, as validated by the human study conducted by the authors of 'Re-evaluating Hallucination Detection in LLMs'.
Authors
Sources
- Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org via serper
Referenced by nodes (3)
- factual correctness concept
- ROUGE concept
- LLM-as-a-judge concept