claim
Traditional automatic metrics like BLEU, ROUGE, and METEOR are inadequate for assessing factual consistency in large language models, according to Maynez et al. (2020).

Authors

Sources

Referenced by nodes (4)