hallucination rate ↔ average summary length

Relations (1)

related 5.36 — strongly supporting 40 facts

These concepts are related as they are both standard performance metrics used to evaluate large language models, as demonstrated by their simultaneous measurement across various models such as [1], [2], and [3].

Facts (40)

Sources

vectara/hallucination-leaderboard - GitHub github.com Vectara 40 facts

measurementThe openai/gpt-5-nano-2025-08-07 model achieved a hallucination rate of 10.5%, a factual consistency rate of 89.5%, an answer rate of 100.0%, and an average summary length of 105.7 words as of March 20, 2026.

measurementThe mistralai/ministral-8b-2410 model achieved a hallucination rate of 7.4%, a factual consistency rate of 92.6%, an answer rate of 99.9%, and an average summary length of 196.0 words as of March 20, 2026.

measurementThe google/gemini-3.1-flash-lite-preview model achieved a hallucination rate of 8.2%, a factual consistency rate of 91.8%, an answer rate of 99.6%, and an average summary length of 62.6 words as of March 20, 2026.

measurementThe xai-org/grok-3 model achieved a hallucination rate of 5.8%, a factual consistency rate of 94.2%, an answer rate of 93.0%, and an average summary length of 95.9 words as of March 20, 2026.

measurementThe google/gemini-2.5-pro model achieved a hallucination rate of 7.0%, a factual consistency rate of 93.0%, an answer rate of 99.1%, and an average summary length of 106.4 words as of March 20, 2026.

measurementThe zai-org/GLM-4.5-AIR-FP8 model achieved a hallucination rate of 9.3%, a factual consistency rate of 90.7%, an answer rate of 98.1%, and an average summary length of 70.6 words as of March 20, 2026.

measurementThe meta-llama/Llama-4-Scout-17B-16E-Instruct model achieved a hallucination rate of 7.7%, a factual consistency rate of 92.3%, an answer rate of 99.0%, and an average summary length of 137.3 words as of March 20, 2026.

measurementThe anthropic/claude-sonnet-4-6 model achieved a hallucination rate of 10.6%, a factual consistency rate of 89.4%, an answer rate of 99.9%, and an average summary length of 114.7 words as of March 20, 2026.

measurementThe qwen/qwen3.5-plus-2026-02-15 model achieved a hallucination rate of 10.7%, a factual consistency rate of 89.3%, an answer rate of 99.8%, and an average summary length of 92.1 words as of March 20, 2026.

measurementThe google/gemini-3.1-pro-preview model achieved a hallucination rate of 10.4%, a factual consistency rate of 89.6%, an answer rate of 99.4%, and an average summary length of 107.7 words as of March 20, 2026.

measurementThe qwen/qwen3-32b model achieved a hallucination rate of 5.9%, a factual consistency rate of 94.1%, an answer rate of 99.9%, and an average summary length of 115.8 words as of March 20, 2026.

measurementThe arcee-ai/trinity-large-preview model achieved a hallucination rate of 6.9%, a factual consistency rate of 93.1%, an answer rate of 99.0%, and an average summary length of 117.3 words as of March 20, 2026.

measurementThe anthropic/claude-sonnet-4-20250514 model achieved a hallucination rate of 10.3%, a factual consistency rate of 89.7%, an answer rate of 98.6%, and an average summary length of 145.8 words as of March 20, 2026.

measurementThe deepseek-ai/DeepSeek-V3 model achieved a hallucination rate of 6.1%, a factual consistency rate of 93.9%, an answer rate of 97.5%, and an average summary length of 81.7 words as of March 20, 2026.

measurementThe openai/gpt-4o-2024-08-06 model achieved a hallucination rate of 9.6%, a factual consistency rate of 90.4%, an answer rate of 93.8%, and an average summary length of 86.6 words as of March 20, 2026.

measurementThe google/gemma-3-4b-it model achieved a hallucination rate of 6.4%, a factual consistency rate of 93.6%, an answer rate of 67.3%, and an average summary length of 77.4 words as of March 20, 2026.

measurementThe deepseek-ai/DeepSeek-V3.2 model achieved a hallucination rate of 6.3%, a factual consistency rate of 93.7%, an answer rate of 92.6%, and an average summary length of 62.0 words as of March 20, 2026.

measurementThe ai21labs/jamba-large-1.7-2025-07 model achieved a hallucination rate of 9.7%, a factual consistency rate of 90.3%, an answer rate of 98.9%, and an average summary length of 124.8 words as of March 20, 2026.

measurementThe qwen/qwen3.5-flash-2026-02-23 model achieved a hallucination rate of 10.5%, a factual consistency rate of 89.5%, an answer rate of 99.8%, and an average summary length of 95.0 words as of March 20, 2026.

measurementThe google/gemma-3-27b-it model achieved a hallucination rate of 7.4%, a factual consistency rate of 92.6%, an answer rate of 98.8%, and an average summary length of 96.4 words as of March 20, 2026.

measurementThe CohereLabs/c4ai-aya-expanse-8b model achieved a hallucination rate of 9.5%, a factual consistency rate of 90.5%, an answer rate of 77.5%, and an average summary length of 88.2 words as of March 20, 2026.

measurementThe MiniMaxAI/minimax-m2p5 model achieved a hallucination rate of 9.1%, a factual consistency rate of 90.9%, an answer rate of 98.2%, and an average summary length of 137.2 words as of March 20, 2026.

measurementThe openai/gpt-5.4-pro-2026-03-05 model achieved a hallucination rate of 8.3%, a factual consistency rate of 91.7%, an answer rate of 100.0%, and an average summary length of 148.5 words as of March 20, 2026.

measurementThe amazon/nova-lite-v1:0 model achieved a hallucination rate of 6.1%, a factual consistency rate of 93.9%, an answer rate of 99.9%, and an average summary length of 91.8 words as of March 20, 2026.

measurementThe openai/gpt-5.2-low-2025-12-11 model achieved a hallucination rate of 8.4%, a factual consistency rate of 91.6%, an answer rate of 100.0%, and an average summary length of 126.5 words as of March 20, 2026.

measurementThe anthropic/claude-haiku-4-5-20251001 model achieved a hallucination rate of 9.8%, a factual consistency rate of 90.2%, an answer rate of 99.5%, and an average summary length of 115.1 words as of March 20, 2026.

measurementThe qwen/qwen3-next-80b-a3b-thinking model achieved a hallucination rate of 9.3%, a factual consistency rate of 90.7%, an answer rate of 94.4%, and an average summary length of 70.9 words as of March 20, 2026.

measurementThe nvidia/Nemotron-3-Nano-30B-A3B model achieved a hallucination rate of 9.6%, a factual consistency rate of 90.4%, an answer rate of 99.6%, and an average summary length of 104.2 words as of March 20, 2026.

measurementThe mistralai/ministral-3b-2410 model achieved a hallucination rate of 7.3%, a factual consistency rate of 92.7%, an answer rate of 99.9%, and an average summary length of 167.9 words as of March 20, 2026.

measurementThe CohereLabs/command-a-03-2025 model achieved a hallucination rate of 9.3%, a factual consistency rate of 90.7%, an answer rate of 97.6%, and an average summary length of 101.7 words as of March 20, 2026.

measurementThe openai/gpt-5.4-2026-03-05 model achieved a hallucination rate of 7.0%, a factual consistency rate of 93.0%, an answer rate of 99.9%, and an average summary length of 81.7 words as of March 20, 2026.

measurementThe google/gemini-2.5-flash model achieved a hallucination rate of 7.8%, a factual consistency rate of 92.2%, an answer rate of 99.0%, and an average summary length of 101.5 words as of March 20, 2026.

measurementThe zai-org/GLM-4.6 model achieved a hallucination rate of 9.5%, a factual consistency rate of 90.5%, an answer rate of 94.5%, and an average summary length of 77.2 words as of March 20, 2026.

measurementThe ibm-granite/granite-3.3-8b-instruct model achieved a hallucination rate of 10.6%, a factual consistency rate of 89.4%, an answer rate of 100.0%, and an average summary length of 131.4 words as of March 20, 2026.

measurementThe zai-org/glm-5 model achieved a hallucination rate of 10.1%, a factual consistency rate of 89.9%, an answer rate of 99.7%, and an average summary length of 74.4 words as of March 20, 2026.

measurementThe CohereLabs/command-r-plus-08-2024 model achieved a hallucination rate of 6.9%, a factual consistency rate of 93.1%, an answer rate of 95.0%, and an average summary length of 91.5 words as of March 20, 2026.

measurementThe zai-org/GLM-4.7-flash model achieved a hallucination rate of 9.3%, a factual consistency rate of 90.7%, an answer rate of 91.6%, and an average summary length of 71.8 words as of March 20, 2026.

measurementThe meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 model achieved a hallucination rate of 8.2%, a factual consistency rate of 91.8%, an answer rate of 100.0%, and an average summary length of 106.0 words as of March 20, 2026.

measurementThe qwen/qwen3-235b-a22b model achieved a hallucination rate of 9.3%, a factual consistency rate of 90.7%, an answer rate of 94.9%, and an average summary length of 105.6 words as of March 20, 2026.

measurementThe qwen/qwen3.5-35b-a3b model achieved a hallucination rate of 10.5%, a factual consistency rate of 89.5%, an answer rate of 99.8%, and an average summary length of 94.9 words as of March 20, 2026.