hallucination rate ↔ answer rate

Relations (1)

related 6.04 — strongly supporting 65 facts

Hallucination rate and answer rate are both key performance metrics used to evaluate large language models, as demonstrated by their simultaneous measurement across various models like [1], [2], and [3].

Facts (65)

Sources

vectara/hallucination-leaderboard - GitHub github.com Vectara 65 facts

measurementThe openai/gpt-5-nano-2025-08-07 model achieved a hallucination rate of 10.5%, a factual consistency rate of 89.5%, an answer rate of 100.0%, and an average summary length of 105.7 words as of March 20, 2026.

measurementThe moonshotai/Kimi-K2.5 model achieved a hallucination rate of 14.2%, a factual consistency rate of 85.8%, an answer rate of 92.2%, and an average summary length of 112.0 words as of March 20, 2026.

measurementThe mistralai/ministral-8b-2410 model achieved a hallucination rate of 7.4%, a factual consistency rate of 92.6%, an answer rate of 99.9%, and an average summary length of 196.0 words as of March 20, 2026.

measurementThe google/gemini-3.1-flash-lite-preview model achieved a hallucination rate of 8.2%, a factual consistency rate of 91.8%, an answer rate of 99.6%, and an average summary length of 62.6 words as of March 20, 2026.

measurementThe qwen/qwen3.5-27b model achieved a hallucination rate of 12.1%, a factual consistency rate of 87.9%, an answer rate of 99.8%, and an average summary length of 94.4 words as of March 20, 2026.

measurementThe google/gemini-3-pro-preview model achieved a hallucination rate of 13.6%, a factual consistency rate of 86.4%, an answer rate of 99.4%, and an average summary length of 101.9 words as of March 20, 2026.

measurementThe anthropic/claude-sonnet-4-5-20250929 model achieved a hallucination rate of 12.0%, a factual consistency rate of 88.0%, an answer rate of 95.6%, and an average summary length of 127.8 words as of March 20, 2026.

measurementThe xai-org/grok-3 model achieved a hallucination rate of 5.8%, a factual consistency rate of 94.2%, an answer rate of 93.0%, and an average summary length of 95.9 words as of March 20, 2026.

measurementThe openai/gpt-5-mini-2025-08-07 model achieved a hallucination rate of 12.9%, a factual consistency rate of 87.1%, an answer rate of 99.9%, and an average summary length of 169.7 words as of March 20, 2026.

measurementThe zai-org/glm-4p7 model achieved a hallucination rate of 11.7%, a factual consistency rate of 88.3%, an answer rate of 99.8%, and an average summary length of 70.6 words as of March 20, 2026.

measurementThe google/gemini-2.5-pro model achieved a hallucination rate of 7.0%, a factual consistency rate of 93.0%, an answer rate of 99.1%, and an average summary length of 106.4 words as of March 20, 2026.

measurementThe zai-org/GLM-4.5-AIR-FP8 model achieved a hallucination rate of 9.3%, a factual consistency rate of 90.7%, an answer rate of 98.1%, and an average summary length of 70.6 words as of March 20, 2026.

measurementThe anthropic/claude-opus-4-1-20250805 model achieved a hallucination rate of 11.8%, a factual consistency rate of 88.2%, an answer rate of 92.4%, and an average summary length of 129.1 words as of March 20, 2026.

measurementThe xai-org/grok-4-1-fast-non-reasoning model achieved a hallucination rate of 17.8%, a factual consistency rate of 82.2%, an answer rate of 98.5%, and an average summary length of 87.5 words as of March 20, 2026.

measurementThe meta-llama/Llama-4-Scout-17B-16E-Instruct model achieved a hallucination rate of 7.7%, a factual consistency rate of 92.3%, an answer rate of 99.0%, and an average summary length of 137.3 words as of March 20, 2026.

measurementThe anthropic/claude-sonnet-4-6 model achieved a hallucination rate of 10.6%, a factual consistency rate of 89.4%, an answer rate of 99.9%, and an average summary length of 114.7 words as of March 20, 2026.

measurementThe openai/gpt-5.1-high-2025-11-13 model achieved a hallucination rate of 12.1%, a factual consistency rate of 87.9%, an answer rate of 100.0%, and an average summary length of 254.4 words as of March 20, 2026.

measurementThe qwen/qwen3.5-plus-2026-02-15 model achieved a hallucination rate of 10.7%, a factual consistency rate of 89.3%, an answer rate of 99.8%, and an average summary length of 92.1 words as of March 20, 2026.

measurementThe google/gemini-3.1-pro-preview model achieved a hallucination rate of 10.4%, a factual consistency rate of 89.6%, an answer rate of 99.4%, and an average summary length of 107.7 words as of March 20, 2026.

measurementThe qwen/qwen3-32b model achieved a hallucination rate of 5.9%, a factual consistency rate of 94.1%, an answer rate of 99.9%, and an average summary length of 115.8 words as of March 20, 2026.

measurementThe arcee-ai/trinity-large-preview model achieved a hallucination rate of 6.9%, a factual consistency rate of 93.1%, an answer rate of 99.0%, and an average summary length of 117.3 words as of March 20, 2026.

measurementThe anthropic/claude-sonnet-4-20250514 model achieved a hallucination rate of 10.3%, a factual consistency rate of 89.7%, an answer rate of 98.6%, and an average summary length of 145.8 words as of March 20, 2026.

measurementThe deepseek-ai/DeepSeek-V3 model achieved a hallucination rate of 6.1%, a factual consistency rate of 93.9%, an answer rate of 97.5%, and an average summary length of 81.7 words as of March 20, 2026.

measurementThe openai/gpt-4o-2024-08-06 model achieved a hallucination rate of 9.6%, a factual consistency rate of 90.4%, an answer rate of 93.8%, and an average summary length of 86.6 words as of March 20, 2026.

measurementThe google/gemma-3-4b-it model achieved a hallucination rate of 6.4%, a factual consistency rate of 93.6%, an answer rate of 67.3%, and an average summary length of 77.4 words as of March 20, 2026.

measurementThe deepseek-ai/DeepSeek-V3.2 model achieved a hallucination rate of 6.3%, a factual consistency rate of 93.7%, an answer rate of 92.6%, and an average summary length of 62.0 words as of March 20, 2026.

measurementThe mistralai/mistral-3-large-2512 model achieved a hallucination rate of 14.5%, a factual consistency rate of 85.5%, an answer rate of 98.8%, and an average summary length of 112.7 words as of March 20, 2026.

measurementThe ai21labs/jamba-large-1.7-2025-07 model achieved a hallucination rate of 9.7%, a factual consistency rate of 90.3%, an answer rate of 98.9%, and an average summary length of 124.8 words as of March 20, 2026.

measurementThe qwen/qwen3.5-flash-2026-02-23 model achieved a hallucination rate of 10.5%, a factual consistency rate of 89.5%, an answer rate of 99.8%, and an average summary length of 95.0 words as of March 20, 2026.

measurementThe google/gemma-3-27b-it model achieved a hallucination rate of 7.4%, a factual consistency rate of 92.6%, an answer rate of 98.8%, and an average summary length of 96.4 words as of March 20, 2026.

measurementThe anthropic/claude-opus-4-6 model achieved a hallucination rate of 12.2%, a factual consistency rate of 87.8%, an answer rate of 99.8%, and an average summary length of 137.6 words as of March 20, 2026.

measurementThe CohereLabs/c4ai-aya-expanse-8b model achieved a hallucination rate of 9.5%, a factual consistency rate of 90.5%, an answer rate of 77.5%, and an average summary length of 88.2 words as of March 20, 2026.

measurementThe qwen/qwen3.5-122b-a10b model achieved a hallucination rate of 11.2%, a factual consistency rate of 88.8%, an answer rate of 99.8%, and an average summary length of 86.4 words as of March 20, 2026.

measurementThe CohereLabs/c4ai-aya-expanse-32b model achieved a hallucination rate of 10.9%, a factual consistency rate of 89.1%, an answer rate of 99.8%, and an average summary length of 112.7 words as of March 20, 2026.

measurementThe deepseek-ai/DeepSeek-R1 model achieved a hallucination rate of 11.3%, a factual consistency rate of 88.7%, an answer rate of 97.0%, and an average summary length of 93.5 words as of March 20, 2026.

measurementThe MiniMaxAI/minimax-m2p5 model achieved a hallucination rate of 9.1%, a factual consistency rate of 90.9%, an answer rate of 98.2%, and an average summary length of 137.2 words as of March 20, 2026.

measurementThe anthropic/claude-opus-4-5-20251101 model achieved a hallucination rate of 10.9%, a factual consistency rate of 89.1%, an answer rate of 98.7%, and an average summary length of 114.5 words as of March 20, 2026.

measurementThe openai/gpt-oss-120b model achieved a hallucination rate of 14.2%, a factual consistency rate of 85.8%, an answer rate of 99.9%, and an average summary length of 135.2 words as of March 20, 2026.

measurementThe MiniMaxAI/minimax-m2p1 model achieved a hallucination rate of 11.8%, a factual consistency rate of 88.2%, an answer rate of 98.5%, and an average summary length of 106.9 words as of March 20, 2026.

measurementThe openai/gpt-5.4-pro-2026-03-05 model achieved a hallucination rate of 8.3%, a factual consistency rate of 91.7%, an answer rate of 100.0%, and an average summary length of 148.5 words as of March 20, 2026.

measurementThe amazon/nova-lite-v1:0 model achieved a hallucination rate of 6.1%, a factual consistency rate of 93.9%, an answer rate of 99.9%, and an average summary length of 91.8 words as of March 20, 2026.

measurementThe openai/gpt-5.2-low-2025-12-11 model achieved a hallucination rate of 8.4%, a factual consistency rate of 91.6%, an answer rate of 100.0%, and an average summary length of 126.5 words as of March 20, 2026.

measurementThe anthropic/claude-haiku-4-5-20251001 model achieved a hallucination rate of 9.8%, a factual consistency rate of 90.2%, an answer rate of 99.5%, and an average summary length of 115.1 words as of March 20, 2026.

measurementThe qwen/qwen3-next-80b-a3b-thinking model achieved a hallucination rate of 9.3%, a factual consistency rate of 90.7%, an answer rate of 94.4%, and an average summary length of 70.9 words as of March 20, 2026.

measurementThe nvidia/Nemotron-3-Nano-30B-A3B model achieved a hallucination rate of 9.6%, a factual consistency rate of 90.4%, an answer rate of 99.6%, and an average summary length of 104.2 words as of March 20, 2026.

measurementThe inceptionlabs/mercury-2 model achieved a hallucination rate of 12.3%, a factual consistency rate of 87.7%, an answer rate of 100.0%, and an average summary length of 149.1 words as of March 20, 2026.

measurementThe openai/gpt-5-minimal-2025-08-07 model achieved a hallucination rate of 14.7%, a factual consistency rate of 85.3%, an answer rate of 99.9%, and an average summary length of 109.7 words as of March 20, 2026.

measurementThe mistralai/ministral-3b-2410 model achieved a hallucination rate of 7.3%, a factual consistency rate of 92.7%, an answer rate of 99.9%, and an average summary length of 167.9 words as of March 20, 2026.

measurementThe anthropic/claude-opus-4-20250514 model achieved a hallucination rate of 12.0%, a factual consistency rate of 88.0%, an answer rate of 91.0%, and an average summary length of 123.2 words as of March 20, 2026.

measurementThe openai/gpt-5-high-2025-08-07 model achieved a hallucination rate of 15.1%, a factual consistency rate of 84.9%, an answer rate of 99.9%, and an average summary length of 162.7 words as of March 20, 2026.

measurementThe CohereLabs/command-a-03-2025 model achieved a hallucination rate of 9.3%, a factual consistency rate of 90.7%, an answer rate of 97.6%, and an average summary length of 101.7 words as of March 20, 2026.

measurementThe openai/gpt-5.4-2026-03-05 model achieved a hallucination rate of 7.0%, a factual consistency rate of 93.0%, an answer rate of 99.9%, and an average summary length of 81.7 words as of March 20, 2026.

measurementThe google/gemini-2.5-flash model achieved a hallucination rate of 7.8%, a factual consistency rate of 92.2%, an answer rate of 99.0%, and an average summary length of 101.5 words as of March 20, 2026.

measurementThe zai-org/GLM-4.6 model achieved a hallucination rate of 9.5%, a factual consistency rate of 90.5%, an answer rate of 94.5%, and an average summary length of 77.2 words as of March 20, 2026.

measurementThe ibm-granite/granite-3.3-8b-instruct model achieved a hallucination rate of 10.6%, a factual consistency rate of 89.4%, an answer rate of 100.0%, and an average summary length of 131.4 words as of March 20, 2026.

measurementThe zai-org/glm-5 model achieved a hallucination rate of 10.1%, a factual consistency rate of 89.9%, an answer rate of 99.7%, and an average summary length of 74.4 words as of March 20, 2026.

measurementThe openai/gpt-5.2-high-2025-12-11 model achieved a hallucination rate of 10.8%, a factual consistency rate of 89.2%, an answer rate of 100.0%, and an average summary length of 186.3 words as of March 20, 2026.

measurementThe google/gemini-3-flash-preview model achieved a hallucination rate of 13.5%, a factual consistency rate of 86.5%, an answer rate of 99.8%, and an average summary length of 90.2 words as of March 20, 2026.

measurementThe CohereLabs/command-r-plus-08-2024 model achieved a hallucination rate of 6.9%, a factual consistency rate of 93.1%, an answer rate of 95.0%, and an average summary length of 91.5 words as of March 20, 2026.

measurementThe zai-org/GLM-4.7-flash model achieved a hallucination rate of 9.3%, a factual consistency rate of 90.7%, an answer rate of 91.6%, and an average summary length of 71.8 words as of March 20, 2026.

measurementThe meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 model achieved a hallucination rate of 8.2%, a factual consistency rate of 91.8%, an answer rate of 100.0%, and an average summary length of 106.0 words as of March 20, 2026.

measurementThe qwen/qwen3-235b-a22b model achieved a hallucination rate of 9.3%, a factual consistency rate of 90.7%, an answer rate of 94.9%, and an average summary length of 105.6 words as of March 20, 2026.

measurementThe qwen/qwen3.5-35b-a3b model achieved a hallucination rate of 10.5%, a factual consistency rate of 89.5%, an answer rate of 99.8%, and an average summary length of 94.9 words as of March 20, 2026.

measurementThe ai21labs/jamba-mini-1.7-2025-07 model achieved a hallucination rate of 14.7%, a factual consistency rate of 85.3%, an answer rate of 99.1%, and an average summary length of 136.4 words as of March 20, 2026.

measurementThe openai/gpt-5.1-low-2025-11-13 model achieved a hallucination rate of 10.9%, a factual consistency rate of 89.1%, an answer rate of 100.0%, and an average summary length of 165.5 words as of March 20, 2026.