claim
ROUGE, a metric based on lexical overlap, exhibits high recall but extremely low precision when used for hallucination detection, leading to misleading performance estimates.

Authors

Sources

Referenced by nodes (2)