measurement
According to the 'Survey and analysis of hallucinations in large language models,' the overall hallucination rates (HR) for evaluated LLMs are: LLaMA 2 (13B) at 31.3%, Mistral 7B at 25.8%, DeepSeek 67B at 23.2%, OpenChat-3.5 at 28.4%, and Gwen at 26.7%.
Authors
Sources
- Survey and analysis of hallucinations in large language models www.frontiersin.org via serper