claim
Automatic metrics such as BLEU or ROUGE fail to capture factual consistency and reliability in Large Language Models, according to Maynez et al. (2020).

Authors

Sources

Referenced by nodes (4)