claim
Traditional automatic metrics like BLEU, ROUGE, and METEOR are inadequate for assessing factual consistency in large language models, according to Maynez et al. (2020).
Authors
Sources
- Survey and analysis of hallucinations in large language models www.frontiersin.org via serper
Referenced by nodes (4)
- Large Language Models concept
- ROUGE concept
- BLEU concept
- factual consistency evaluation concept