claim
The authors of 'Re-evaluating Hallucination Detection in LLMs' state that while LLM-as-Judge is more robust than ROUGE for human-aligned evaluation, it is not without its own biases and limitations.
Authors
Sources
- Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org via serper
Referenced by nodes (3)
- LLM-as-a-judge concept
- ROUGE concept
- The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs concept