factual consistency rate ↔ summary length

Relations (1)

related 4.70 — strongly supporting 25 facts

Justification not yet generated — showing supporting facts

The moonshotai/Kimi-K2.5 model achieved a hallucination rate of 14.2%, a factual consistency rate of 85.8%, an answer rate of 92.2%, and an average summary length of 112.0 words as of March 20, 2026.
The qwen/qwen3.5-27b model achieved a hallucination rate of 12.1%, a factual consistency rate of 87.9%, an answer rate of 99.8%, and an average summary length of 94.4 words as of March 20, 2026.
The google/gemini-3-pro-preview model achieved a hallucination rate of 13.6%, a factual consistency rate of 86.4%, an answer rate of 99.4%, and an average summary length of 101.9 words as of March 20, 2026.
The anthropic/claude-sonnet-4-5-20250929 model achieved a hallucination rate of 12.0%, a factual consistency rate of 88.0%, an answer rate of 95.6%, and an average summary length of 127.8 words as of March 20, 2026.
The openai/gpt-5-mini-2025-08-07 model achieved a hallucination rate of 12.9%, a factual consistency rate of 87.1%, an answer rate of 99.9%, and an average summary length of 169.7 words as of March 20, 2026.
The zai-org/glm-4p7 model achieved a hallucination rate of 11.7%, a factual consistency rate of 88.3%, an answer rate of 99.8%, and an average summary length of 70.6 words as of March 20, 2026.
The anthropic/claude-opus-4-1-20250805 model achieved a hallucination rate of 11.8%, a factual consistency rate of 88.2%, an answer rate of 92.4%, and an average summary length of 129.1 words as of March 20, 2026.
The xai-org/grok-4-1-fast-non-reasoning model achieved a hallucination rate of 17.8%, a factual consistency rate of 82.2%, an answer rate of 98.5%, and an average summary length of 87.5 words as of March 20, 2026.
The openai/gpt-5.1-high-2025-11-13 model achieved a hallucination rate of 12.1%, a factual consistency rate of 87.9%, an answer rate of 100.0%, and an average summary length of 254.4 words as of March 20, 2026.
The mistralai/mistral-3-large-2512 model achieved a hallucination rate of 14.5%, a factual consistency rate of 85.5%, an answer rate of 98.8%, and an average summary length of 112.7 words as of March 20, 2026.
The anthropic/claude-opus-4-6 model achieved a hallucination rate of 12.2%, a factual consistency rate of 87.8%, an answer rate of 99.8%, and an average summary length of 137.6 words as of March 20, 2026.
The qwen/qwen3.5-122b-a10b model achieved a hallucination rate of 11.2%, a factual consistency rate of 88.8%, an answer rate of 99.8%, and an average summary length of 86.4 words as of March 20, 2026.
The CohereLabs/c4ai-aya-expanse-32b model achieved a hallucination rate of 10.9%, a factual consistency rate of 89.1%, an answer rate of 99.8%, and an average summary length of 112.7 words as of March 20, 2026.
The deepseek-ai/DeepSeek-R1 model achieved a hallucination rate of 11.3%, a factual consistency rate of 88.7%, an answer rate of 97.0%, and an average summary length of 93.5 words as of March 20, 2026.
The anthropic/claude-opus-4-5-20251101 model achieved a hallucination rate of 10.9%, a factual consistency rate of 89.1%, an answer rate of 98.7%, and an average summary length of 114.5 words as of March 20, 2026.
The openai/gpt-oss-120b model achieved a hallucination rate of 14.2%, a factual consistency rate of 85.8%, an answer rate of 99.9%, and an average summary length of 135.2 words as of March 20, 2026.
The MiniMaxAI/minimax-m2p1 model achieved a hallucination rate of 11.8%, a factual consistency rate of 88.2%, an answer rate of 98.5%, and an average summary length of 106.9 words as of March 20, 2026.
The inceptionlabs/mercury-2 model achieved a hallucination rate of 12.3%, a factual consistency rate of 87.7%, an answer rate of 100.0%, and an average summary length of 149.1 words as of March 20, 2026.
The openai/gpt-5-minimal-2025-08-07 model achieved a hallucination rate of 14.7%, a factual consistency rate of 85.3%, an answer rate of 99.9%, and an average summary length of 109.7 words as of March 20, 2026.
The anthropic/claude-opus-4-20250514 model achieved a hallucination rate of 12.0%, a factual consistency rate of 88.0%, an answer rate of 91.0%, and an average summary length of 123.2 words as of March 20, 2026.
The openai/gpt-5-high-2025-08-07 model achieved a hallucination rate of 15.1%, a factual consistency rate of 84.9%, an answer rate of 99.9%, and an average summary length of 162.7 words as of March 20, 2026.
The openai/gpt-5.2-high-2025-12-11 model achieved a hallucination rate of 10.8%, a factual consistency rate of 89.2%, an answer rate of 100.0%, and an average summary length of 186.3 words as of March 20, 2026.
The google/gemini-3-flash-preview model achieved a hallucination rate of 13.5%, a factual consistency rate of 86.5%, an answer rate of 99.8%, and an average summary length of 90.2 words as of March 20, 2026.
The ai21labs/jamba-mini-1.7-2025-07 model achieved a hallucination rate of 14.7%, a factual consistency rate of 85.3%, an answer rate of 99.1%, and an average summary length of 136.4 words as of March 20, 2026.
The openai/gpt-5.1-low-2025-11-13 model achieved a hallucination rate of 10.9%, a factual consistency rate of 89.1%, an answer rate of 100.0%, and an average summary length of 165.5 words as of March 20, 2026.

Facts (25)

Sources

vectara/hallucination-leaderboard - GitHub github.com Vectara 25 facts

measurementThe moonshotai/Kimi-K2.5 model achieved a hallucination rate of 14.2%, a factual consistency rate of 85.8%, an answer rate of 92.2%, and an average summary length of 112.0 words as of March 20, 2026.

measurementThe qwen/qwen3.5-27b model achieved a hallucination rate of 12.1%, a factual consistency rate of 87.9%, an answer rate of 99.8%, and an average summary length of 94.4 words as of March 20, 2026.

measurementThe google/gemini-3-pro-preview model achieved a hallucination rate of 13.6%, a factual consistency rate of 86.4%, an answer rate of 99.4%, and an average summary length of 101.9 words as of March 20, 2026.

measurementThe anthropic/claude-sonnet-4-5-20250929 model achieved a hallucination rate of 12.0%, a factual consistency rate of 88.0%, an answer rate of 95.6%, and an average summary length of 127.8 words as of March 20, 2026.

measurementThe openai/gpt-5-mini-2025-08-07 model achieved a hallucination rate of 12.9%, a factual consistency rate of 87.1%, an answer rate of 99.9%, and an average summary length of 169.7 words as of March 20, 2026.

measurementThe zai-org/glm-4p7 model achieved a hallucination rate of 11.7%, a factual consistency rate of 88.3%, an answer rate of 99.8%, and an average summary length of 70.6 words as of March 20, 2026.

measurementThe anthropic/claude-opus-4-1-20250805 model achieved a hallucination rate of 11.8%, a factual consistency rate of 88.2%, an answer rate of 92.4%, and an average summary length of 129.1 words as of March 20, 2026.

measurementThe xai-org/grok-4-1-fast-non-reasoning model achieved a hallucination rate of 17.8%, a factual consistency rate of 82.2%, an answer rate of 98.5%, and an average summary length of 87.5 words as of March 20, 2026.

measurementThe openai/gpt-5.1-high-2025-11-13 model achieved a hallucination rate of 12.1%, a factual consistency rate of 87.9%, an answer rate of 100.0%, and an average summary length of 254.4 words as of March 20, 2026.

measurementThe mistralai/mistral-3-large-2512 model achieved a hallucination rate of 14.5%, a factual consistency rate of 85.5%, an answer rate of 98.8%, and an average summary length of 112.7 words as of March 20, 2026.

measurementThe anthropic/claude-opus-4-6 model achieved a hallucination rate of 12.2%, a factual consistency rate of 87.8%, an answer rate of 99.8%, and an average summary length of 137.6 words as of March 20, 2026.

measurementThe qwen/qwen3.5-122b-a10b model achieved a hallucination rate of 11.2%, a factual consistency rate of 88.8%, an answer rate of 99.8%, and an average summary length of 86.4 words as of March 20, 2026.

measurementThe CohereLabs/c4ai-aya-expanse-32b model achieved a hallucination rate of 10.9%, a factual consistency rate of 89.1%, an answer rate of 99.8%, and an average summary length of 112.7 words as of March 20, 2026.

measurementThe deepseek-ai/DeepSeek-R1 model achieved a hallucination rate of 11.3%, a factual consistency rate of 88.7%, an answer rate of 97.0%, and an average summary length of 93.5 words as of March 20, 2026.

measurementThe anthropic/claude-opus-4-5-20251101 model achieved a hallucination rate of 10.9%, a factual consistency rate of 89.1%, an answer rate of 98.7%, and an average summary length of 114.5 words as of March 20, 2026.

measurementThe openai/gpt-oss-120b model achieved a hallucination rate of 14.2%, a factual consistency rate of 85.8%, an answer rate of 99.9%, and an average summary length of 135.2 words as of March 20, 2026.

measurementThe MiniMaxAI/minimax-m2p1 model achieved a hallucination rate of 11.8%, a factual consistency rate of 88.2%, an answer rate of 98.5%, and an average summary length of 106.9 words as of March 20, 2026.

measurementThe inceptionlabs/mercury-2 model achieved a hallucination rate of 12.3%, a factual consistency rate of 87.7%, an answer rate of 100.0%, and an average summary length of 149.1 words as of March 20, 2026.

measurementThe openai/gpt-5-minimal-2025-08-07 model achieved a hallucination rate of 14.7%, a factual consistency rate of 85.3%, an answer rate of 99.9%, and an average summary length of 109.7 words as of March 20, 2026.

measurementThe anthropic/claude-opus-4-20250514 model achieved a hallucination rate of 12.0%, a factual consistency rate of 88.0%, an answer rate of 91.0%, and an average summary length of 123.2 words as of March 20, 2026.

measurementThe openai/gpt-5-high-2025-08-07 model achieved a hallucination rate of 15.1%, a factual consistency rate of 84.9%, an answer rate of 99.9%, and an average summary length of 162.7 words as of March 20, 2026.

measurementThe openai/gpt-5.2-high-2025-12-11 model achieved a hallucination rate of 10.8%, a factual consistency rate of 89.2%, an answer rate of 100.0%, and an average summary length of 186.3 words as of March 20, 2026.

measurementThe google/gemini-3-flash-preview model achieved a hallucination rate of 13.5%, a factual consistency rate of 86.5%, an answer rate of 99.8%, and an average summary length of 90.2 words as of March 20, 2026.

measurementThe ai21labs/jamba-mini-1.7-2025-07 model achieved a hallucination rate of 14.7%, a factual consistency rate of 85.3%, an answer rate of 99.1%, and an average summary length of 136.4 words as of March 20, 2026.

measurementThe openai/gpt-5.1-low-2025-11-13 model achieved a hallucination rate of 10.9%, a factual consistency rate of 89.1%, an answer rate of 100.0%, and an average summary length of 165.5 words as of March 20, 2026.