claim
Hallucination detection methods that perform well under ROUGE often show a substantial performance drop when re-evaluated using the 'LLM-as-Judge' paradigm.

Authors

Sources

Referenced by nodes (3)