measurement
In QAFactEval experiments, GPT-4 achieved a hallucination rate below 5%, while LLaMA 2 and DeepSeek exhibited hallucination rates between 20% and 25%.

Authors

Sources

Referenced by nodes (1)