Relations (1)
related 4.70 — strongly supporting 25 facts
The hallucination rate and summary length are related as they are both key performance metrics used to evaluate large language models, as demonstrated by the comparative data for various models like openai/gpt-5.2-high-2025-12-11 [1], anthropic/claude-opus-4-5-20251101 [2], and others [3] through [4].
Facts (25)
Sources
vectara/hallucination-leaderboard - GitHub github.com 25 facts
measurementThe moonshotai/Kimi-K2.5 model achieved a hallucination rate of 14.2%, a factual consistency rate of 85.8%, an answer rate of 92.2%, and an average summary length of 112.0 words as of March 20, 2026.
measurementThe qwen/qwen3.5-27b model achieved a hallucination rate of 12.1%, a factual consistency rate of 87.9%, an answer rate of 99.8%, and an average summary length of 94.4 words as of March 20, 2026.
measurementThe google/gemini-3-pro-preview model achieved a hallucination rate of 13.6%, a factual consistency rate of 86.4%, an answer rate of 99.4%, and an average summary length of 101.9 words as of March 20, 2026.
measurementThe anthropic/claude-sonnet-4-5-20250929 model achieved a hallucination rate of 12.0%, a factual consistency rate of 88.0%, an answer rate of 95.6%, and an average summary length of 127.8 words as of March 20, 2026.
measurementThe openai/gpt-5-mini-2025-08-07 model achieved a hallucination rate of 12.9%, a factual consistency rate of 87.1%, an answer rate of 99.9%, and an average summary length of 169.7 words as of March 20, 2026.
measurementThe zai-org/glm-4p7 model achieved a hallucination rate of 11.7%, a factual consistency rate of 88.3%, an answer rate of 99.8%, and an average summary length of 70.6 words as of March 20, 2026.
measurementThe anthropic/claude-opus-4-1-20250805 model achieved a hallucination rate of 11.8%, a factual consistency rate of 88.2%, an answer rate of 92.4%, and an average summary length of 129.1 words as of March 20, 2026.
measurementThe xai-org/grok-4-1-fast-non-reasoning model achieved a hallucination rate of 17.8%, a factual consistency rate of 82.2%, an answer rate of 98.5%, and an average summary length of 87.5 words as of March 20, 2026.
measurementThe openai/gpt-5.1-high-2025-11-13 model achieved a hallucination rate of 12.1%, a factual consistency rate of 87.9%, an answer rate of 100.0%, and an average summary length of 254.4 words as of March 20, 2026.
measurementThe mistralai/mistral-3-large-2512 model achieved a hallucination rate of 14.5%, a factual consistency rate of 85.5%, an answer rate of 98.8%, and an average summary length of 112.7 words as of March 20, 2026.
measurementThe anthropic/claude-opus-4-6 model achieved a hallucination rate of 12.2%, a factual consistency rate of 87.8%, an answer rate of 99.8%, and an average summary length of 137.6 words as of March 20, 2026.
measurementThe qwen/qwen3.5-122b-a10b model achieved a hallucination rate of 11.2%, a factual consistency rate of 88.8%, an answer rate of 99.8%, and an average summary length of 86.4 words as of March 20, 2026.
measurementThe CohereLabs/c4ai-aya-expanse-32b model achieved a hallucination rate of 10.9%, a factual consistency rate of 89.1%, an answer rate of 99.8%, and an average summary length of 112.7 words as of March 20, 2026.
measurementThe deepseek-ai/DeepSeek-R1 model achieved a hallucination rate of 11.3%, a factual consistency rate of 88.7%, an answer rate of 97.0%, and an average summary length of 93.5 words as of March 20, 2026.
measurementThe anthropic/claude-opus-4-5-20251101 model achieved a hallucination rate of 10.9%, a factual consistency rate of 89.1%, an answer rate of 98.7%, and an average summary length of 114.5 words as of March 20, 2026.
measurementThe openai/gpt-oss-120b model achieved a hallucination rate of 14.2%, a factual consistency rate of 85.8%, an answer rate of 99.9%, and an average summary length of 135.2 words as of March 20, 2026.
measurementThe MiniMaxAI/minimax-m2p1 model achieved a hallucination rate of 11.8%, a factual consistency rate of 88.2%, an answer rate of 98.5%, and an average summary length of 106.9 words as of March 20, 2026.
measurementThe inceptionlabs/mercury-2 model achieved a hallucination rate of 12.3%, a factual consistency rate of 87.7%, an answer rate of 100.0%, and an average summary length of 149.1 words as of March 20, 2026.
measurementThe openai/gpt-5-minimal-2025-08-07 model achieved a hallucination rate of 14.7%, a factual consistency rate of 85.3%, an answer rate of 99.9%, and an average summary length of 109.7 words as of March 20, 2026.
measurementThe anthropic/claude-opus-4-20250514 model achieved a hallucination rate of 12.0%, a factual consistency rate of 88.0%, an answer rate of 91.0%, and an average summary length of 123.2 words as of March 20, 2026.
measurementThe openai/gpt-5-high-2025-08-07 model achieved a hallucination rate of 15.1%, a factual consistency rate of 84.9%, an answer rate of 99.9%, and an average summary length of 162.7 words as of March 20, 2026.
measurementThe openai/gpt-5.2-high-2025-12-11 model achieved a hallucination rate of 10.8%, a factual consistency rate of 89.2%, an answer rate of 100.0%, and an average summary length of 186.3 words as of March 20, 2026.
measurementThe google/gemini-3-flash-preview model achieved a hallucination rate of 13.5%, a factual consistency rate of 86.5%, an answer rate of 99.8%, and an average summary length of 90.2 words as of March 20, 2026.
measurementThe ai21labs/jamba-mini-1.7-2025-07 model achieved a hallucination rate of 14.7%, a factual consistency rate of 85.3%, an answer rate of 99.1%, and an average summary length of 136.4 words as of March 20, 2026.
measurementThe openai/gpt-5.1-low-2025-11-13 model achieved a hallucination rate of 10.9%, a factual consistency rate of 89.1%, an answer rate of 100.0%, and an average summary length of 165.5 words as of March 20, 2026.