claim
Automatic metrics such as BLEU or ROUGE fail to capture factual consistency and reliability in Large Language Models, according to Maynez et al. (2020).
Authors
Sources
- Survey and analysis of hallucinations in large language models www.frontiersin.org via serper
Referenced by nodes (4)
- Large Language Models concept
- ROUGE concept
- BLEU concept
- factual consistency evaluation concept