claim
ROUGE can provide misleading assessments of both Large Language Model responses and the efficacy of hallucination detection techniques due to its inherent failure modes.

Authors

Sources

Referenced by nodes (2)