claim
The ROUGE metric suffers from critical failure modes that undermine its utility for hallucination detection, specifically sensitivity to response length, an inability to handle semantic equivalence, and susceptibility to false lexical matches.

Authors

Sources

Referenced by nodes (2)