claim
Sophisticated metrics including BERTScore, BLEU, and UniEval-fact show substantial disagreement with judgments from strong LLM-based evaluators, indicating limitations in capturing factual consistency.

Authors

Sources

Referenced by nodes (3)