claim
The authors of 'Re-evaluating Hallucination Detection in LLMs' state that while LLM-as-Judge is more robust than ROUGE for human-aligned evaluation, it is not without its own biases and limitations.

Authors

Sources

Referenced by nodes (3)