claim
Experimental evaluations using benchmarks like TruthfulQA and HallucinationEval demonstrate performance differences among LLaMA 2, DeepSeek, and GPT-4 regarding hallucination susceptibility.

Authors

Sources

Referenced by nodes (1)