claim
Experimental evaluations using benchmarks like TruthfulQA and HallucinationEval demonstrate performance differences among LLaMA 2, DeepSeek, and GPT-4 regarding hallucination susceptibility.
Authors
Sources
- Survey and analysis of hallucinations in large language models www.frontiersin.org via serper
Referenced by nodes (1)
- TruthfulQA concept