DeepSeek-R1
Also known as: DeepSeek-R1-0528, DeepSeek R1, deepseek-ai/DeepSeek-R1
Facts (25)
Sources
Medical Hallucination in Foundation Models and Their Impact on ... medrxiv.org Nov 2, 2025 6 facts
measurementThe DeepSeek-R1 model experienced a performance decrease when using search augmentation, dropping from an 86.6% baseline to 84.3% (a -2.3% change).
claimThe authors conducted experimental analyses on medical hallucinations across general practice, oncology, cardiology, and medical education using GPT-5, Gemini-2.5 Pro, DeepSeek-R1, and MedGemma.
measurementThe models gemini-2.5-pro, o3-mini, and deepseek-r1 cluster in the high semantic similarity range of 0.8–0.9, indicating strong semantic alignment with ground truth medical information.
measurementThe AI model o3-mini achieves a hallucination resistance baseline of 80.4%, while deepseek-r1 achieves a baseline of 86.6%.
measurementSystem Prompting provides complementary gains to Chain-of-Thought (CoT) reasoning in reducing hallucination rates, as seen in o3-mini (baseline 80.4% to System Prompt 81.4% to CoT 90.7%) and deepseek-r1 (baseline 86.6% to System Prompt 84.5% to CoT 90.7%).
claimDeepSeek-R1 is a reasoning-optimized LLM that employs large-scale reinforcement learning on scientific and mathematical tasks to enhance logical consistency and reduce confabulation.
Medical Hallucination in Foundation Models and Their ... medrxiv.org Mar 3, 2025 5 facts
claimSystem Prompting offers noticeable improvements in reducing hallucination rates and works synergistically with Chain-of-Thought (CoT) prompting, particularly in models like o3-mini and deepSeek-r1.
claimAdvanced general-purpose models like deepseek-r1 and o3-mini demonstrate superior performance in medical tasks compared to domain-specific models, suggesting that broad language understanding and reasoning capabilities are more crucial for reliability than domain-specific training alone.
claimHighly advanced architectures like deepseek-r1 and o3-mini show less pronounced gains from search augmentation, indicating an increasing reliance on their internal knowledge base for accuracy.
claimThe models gemini-2.0 and deepseek-r1 demonstrate robust hallucination resistance, positioning themselves alongside o1-preview and outperforming earlier models.
measurementThe highest-performing models, including gemini-2.0-thinking, gemini-2.0, and deepseek-r1, cluster in the high similarity score range of 0.7-0.9, indicating a strong semantic alignment of their outputs with ground truth medical information.
A Comprehensive Benchmark and Evaluation Framework for Multi ... arxiv.org Jan 6, 2026 5 facts
claimReasoning-enhanced models such as DeepSeek-R1 and GPT-o3-mini demonstrate superior inter-rater reliability with human experts compared to standard instruction-tuned models.
measurementThe study benchmarks two open-source models (Qwen3-235B-A22B-Instruct-2507 and DeepSeek-R1) and two proprietary models (GPT-5 and Gemini-2.5-Pro) to assess inquiry completeness in clinical contexts.
referenceDeepSeek-AI published the DeepSeek-R1 technical report in 2025, detailing the use of reinforcement learning to incentivize reasoning capabilities in large language models.
measurementDeepSeek-R1 diagnostic accuracy plateaus near 40% in the evaluated consultations.
claimModels such as Qwen3-235B-A22B-Instruct-2507 and DeepSeek-r1 plateau early in diagnostic reasoning because they fail to dynamically update their differential diagnosis to ask the next most relevant question.
Unlocking the Potential of Generative AI through Neuro-Symbolic ... arxiv.org Feb 16, 2025 3 facts
referenceThe paper 'Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning' was published as an arXiv preprint in 2025.
claimIn the DeepSeek-R1 framework, reinforcement learning rewards and symbolic constraints coordinate specialized experts, allowing for efficient resource utilization and adherence to reasoning rules.
referenceThe DeepSeek-R1 framework utilizes a Mixture-of-Experts (MoE) architecture to enhance reasoning capabilities in large-scale AI systems by activating only a subset of parameters for each task.
vectara/hallucination-leaderboard - GitHub github.com 2 facts
referenceThe Vectara hallucination leaderboard integrates DeepSeek V3, DeepSeek V3.1, DeepSeek V3.2-Exp, and DeepSeek R1 via the Hugging Face inference provider.
measurementThe deepseek-ai/DeepSeek-R1 model achieved a hallucination rate of 11.3%, a factual consistency rate of 88.7%, an answer rate of 97.0%, and an average summary length of 93.5 words as of March 20, 2026.
A Survey on the Theory and Mechanism of Large Language Models arxiv.org Mar 12, 2026 1 fact
referenceThe paper 'Deepseek-r1: incentivizing reasoning capability in llms via reinforcement learning' (arXiv:2501.12948) is cited in the survey 'A Survey on the Theory and Mechanism of Large Language Models' regarding reasoning capabilities.
A Knowledge Graph-Based Hallucination Benchmark for Evaluating ... arxiv.org Feb 23, 2026 1 fact
measurementDeepSeek-R1-0528 has a knowledge cut-off of 01/2025, a size of 685 billion parameters, and an abstain rate of 48.45%.
The Impact of Open Source on Digital Innovation linkedin.com 1 fact
quoteYann LeCun stated that the ability of the DeepSeek R1 model to match or exceed the performance of the OpenAI o1 model on key benchmarks is a testament to the role of open source in driving innovation, rather than a simplistic East versus West narrative.
Reference Hallucination Score for Medical Artificial ... medinform.jmir.org Jul 31, 2024 1 fact
referenceGao Z, Li J, and Fang W authored 'Dietary guidance for pregnant women using DeepSeek-R1 and ChatGPT-4.0: a comparative analysis', published in Frontiers in Public Health in 2026, volume 14.