OpenAI
Also known as: open-ai
Facts (80)
Sources
Medical Hallucination in Foundation Models and Their ... medrxiv.org Mar 3, 2025 8 facts
claimOpenAI's GPT-4o model, released in May 2024, is a multimodal model capable of processing and generating text, images, and audio with enhanced reasoning and factual accuracy.
procedureThe procedure for refining text for completeness and structure involves prompting OpenAI’s GPT-4o to use text extracted by pdfminer to restore missing text from Marker-extracted content, while ensuring the final output is ordered and in Markdown format.
claimProminent large language models include OpenAI’s GPT series, Google’s Gemini, Anthropic’s Claude, and Meta’s Llama family.
claimOpenAI's GPT-4o-mini model, released in July 2024, is a smaller, cost-effective version of GPT-4o that maintains strong performance with greater efficiency.
claimOpenAI's o1-preview model was introduced in September 2024 and is designed to spend more time thinking before responding to enhance reasoning capabilities for complex tasks.
claimOpenAI's o3-mini model was introduced in January 2025 and is designed to spend more time thinking before responding to enhance reasoning capabilities for complex tasks.
procedureThe procedure for providing summaries of extracted images involves using the multimodal capability of OpenAI’s GPT-4o to generate concise summaries for critical visual content in case records.
procedureThe procedure for handling missing tables in medical case records involves: (1) prompting OpenAI’s GPT-4o model to identify missing tables in text extracted by Marker, (2) re-parsing the document with Marker if the model detects missing tables, and (3) limiting this verification process to a maximum of four trials.
Survey and analysis of hallucinations in large language models frontiersin.org Sep 29, 2025 8 facts
claimInstruction-tuned models can still hallucinate, especially on long-context, ambiguous, or factual-recall tasks, as revealed by studies from OpenAI (2023a) and Bang and Madotto (2023).
measurementDeepSeek demonstrated the highest CMV (Model Bias) value at 0.14, while GPT-4 maintained a lower CMV value of 0.08, which is consistent with better internal factual grounding as noted by OpenAI (2023b).
claimModel-intrinsic hallucinations occur due to limitations in training data, architectural biases, or inference-time sampling strategies, even when well-organized prompts are used, as noted by Bang and Madotto (2023), OpenAI (2023a), and Chen et al. (2023).
claimOpenAI reported in 2023 that GPT-4 hallucinates less frequently than smaller language models.
referenceOpenAI published the GPT-4 System Card in 2023, detailing safety and system information.
claimThe authors of the study did not evaluate larger closed-source models like Anthropic's Claude or OpenAI's GPT-4, noting that these systems have undergone extensive fine-tuning and may exhibit different hallucination profiles compared to the models tested.
claimModels with extensive Reinforcement Learning from Human Feedback (RLHF), such as OpenAI's GPT-4, are more resistant to prompt adversaries compared to purely open-source models without such fine-tuning.
referenceCommunity-maintained leaderboards focusing on hallucination robustness have been established by OpenAI (2023a) and Kadavath et al. (2022).
A survey on augmenting knowledge graphs (KGs) with large ... link.springer.com Nov 4, 2024 6 facts
claimThe OpenAI Generative Pre-trained Transformer (GPT) series, including GPT-2, GPT-3, and GPT-4, established standards for Natural Language Processing.
claimOpenAI's GPT-3 is designed to create coherent, relevant text, while Google's BERT focuses on understanding words in their context for NLP tasks.
referenceRoumeliotis KI and Tselikas ND authored 'Chatgpt and open-ai models: a preliminary review', published in Future Internet in 2023.
claimOpenAI’s GPT series, Google’s BERT, T5, PaLM, and Gemini, and Meta’s RoBERTa, OPT, and LLaMA are recognized as state-of-the-art LLMs.
referenceRadford A, Wu J, Child R, Luan D, Amodei D, Sutskever I, et al. authored 'Language models are unsupervised multitask learners', published on the OpenAI blog in 2019.
measurementOpenAI's GPT-3 model contains 175 billion parameters and is known for high-quality text generation, translation, question answering, and summarization.
A Survey on the Theory and Mechanism of Large Language Models arxiv.org Mar 12, 2026 5 facts
claimLarge Language Models such as ChatGPT (OpenAI, 2022), DeepSeek (Guo et al., 2025), Qwen (Bai et al., 2023a), Llama (Touvron et al., 2023), Gemini (Team et al., 2023), and Claude (Caruccio et al., 2024) have transcended the boundaries of traditional Natural Language Processing as established by Vaswani et al. (2017a).
claimThe paradigm that more computation leads to better reasoning has been popularized by the empirical success of inference-time scaling in leading reasoning models, as noted by OpenAI (2024) and Guo et al. (2025).
claimOpenAI (2023) defined "Superalignment" as the critical AI safety challenge of ensuring that superintelligent AI systems act in accordance with human values, intentions, and goals.
claimOpenAI introduced the "weak-to-strong generalization" (W2SG) paradigm (Burns et al., 2024), which demonstrates that strong pre-trained language models fine-tuned using supervision signals from weaker models consistently surpass the performance of their weak supervisors.
claimChatGPT, released by OpenAI in 2022, serves as proof of the potential described by the Universal Approximation Theorem.
Medical Hallucination in Foundation Models and Their Impact on ... medrxiv.org Nov 2, 2025 5 facts
claimOpenAI's o3-mini and o1 models allocate more inference time to deliberate reasoning before producing responses, which improves performance on complex tasks such as scientific reasoning, coding, and mathematics.
claimOpenAI's GPT-5 emphasizes advanced long-context reasoning and more reliable factual grounding.
claimOpenAI's GPT-4o is a multimodal model capable of processing and generating text, images, and audio with improved factual consistency.
measurementThe OpenAI o1 model experienced a performance increase when using search augmentation, rising from a 64.0% baseline to 69.4% (a +5.4% change).
measurementOpenAI released GPT-4o in May 2024 and GPT-4o mini in July 2024.
The Impact of Open Source on Digital Innovation linkedin.com 4 facts
claimThe OpenAI models gpt-oss-120b and gpt-oss-20b are released under an Apache 2.0 license, which permits free commercial use without restrictions.
claimThe OpenAI models gpt-oss-120b and gpt-oss-20b are designed for advanced agentic tasks, including tool use and code execution.
claimOpenAI has released two new open-weight reasoning models named gpt-oss-120b and gpt-oss-20b.
measurementDeepSeek spent $5.6 million to build its AI model, compared to OpenAI's reported $5 billion per year expenditure.
LLM Hallucination Detection and Mitigation: State of the Art in 2026 zylos.ai Jan 27, 2026 4 facts
referenceOpenAI Guardrails validates factual claims against reference documents using OpenAI's FileSearch API, making it effective for closed-source deployments where white-box methods are unavailable.
referenceOpenAI published 'Monitoring Reasoning Models for Misbehavior,' a guide on how to track and identify undesirable outputs in their reasoning models.
referenceOpenAI provides documentation for their reasoning models, which includes guidance on model behavior and evaluation.
claimOpenAI's 2026 research on reasoning models demonstrates that naturally understandable chain-of-thought reasoning traces are reinforced through reinforcement learning, and that simple prompted GPT-4o models can effectively monitor for reward hacking in frontier reasoning models like o1 and o3-mini successors.
Real-Time Evaluation Models for RAG: Who Detects Hallucinations ... cleanlab.ai Apr 7, 2025 3 facts
claimThe Cleanlab RAG benchmark uses OpenAI’s gpt-4o-mini LLM to power both the 'LLM-as-a-judge' and 'TLM' scoring methods.
referenceThe FinQA dataset consists of complex questions from financial experts regarding public financial reports, with responses generated by OpenAI’s GPT-4o LLM.
claimEvaluation techniques such as 'LLM-as-a-judge' or 'TLM' (Trustworthy Language Model) can be powered by any Large Language Model and do not require specific data preparation, labeling, or custom model infrastructure, provided the user has infrastructure to run pre-trained LLMs like AWS Bedrock, Azure/OpenAI, Gemini, or Together.ai.
LLM Observability: How to Monitor AI When It Thinks in Tokens | TTMS ttms.com Feb 10, 2026 3 facts
procedureCompanies can track the effectiveness of content moderation by measuring the percentage of AI responses flagged for toxicity using tools like OpenAI's moderation API or custom models.
claimOpenAI provides an API and dashboard for evaluation of AI models as a managed service, and also offers an open-source version for self-hosting.
referenceOpenAI Evals is an open-source framework created by OpenAI for systematically evaluating model outputs by defining tests that check outputs against known correct answers or style guidelines.
Combining Knowledge Graphs and Large Language Models - arXiv arxiv.org Jul 9, 2024 3 facts
referenceOpenAI published the 'Gpt-4v (ision) system card' in 2023.
measurementOpenAI released GPT-3 in 2020, which features 175 billion parameters.
claimExamples of large language models include Google’s BERT, Google's T5, and OpenAI’s GPT series.
How to Improve Multi-Hop Reasoning With Knowledge Graphs and ... neo4j.com Jun 18, 2025 2 facts
How Enterprise AI, powered by Knowledge Graphs, is ... blog.metaphacts.com Oct 7, 2025 2 facts
measurementOpenAI found that the GPT-3 large language model produced hallucinations, defined as authoritative-sounding but factually incorrect or fabricated responses, approximately 15% of the time.
measurementOpenAI released ChatGPT to the general public in November 2022.
A framework to assess clinical safety and hallucination rates of LLMs ... nature.com May 13, 2025 2 facts
measurementFor consistency and reproducibility in the experiments, the researchers used OpenAI’s GPT-4 (model GPT-4-32k-0613) with the random seed set to 210, temperature set to 0, and a top-p value of 0.95.
measurementThe study used OpenAI’s GPT-4 (specifically version GPT-4-32k-0613) as the large language model for all experiments.
The Synergy of Symbolic and Connectionist AI in LLM-Empowered ... arxiv.org Jul 11, 2024 2 facts
Reference Hallucination Score for Medical Artificial ... medinform.jmir.org Jul 31, 2024 2 facts
referenceTemsah M et al. published a narrative review titled 'OpenAI's Sora and Google's Veo 2 in Action: A Narrative Review of Artificial Intelligence-driven Video Generation Models Transforming Healthcare' in Cureus in 2025.
referenceTemsah M, Jamal A, Alhasan K, Temsah A, and Malki K published a study titled 'OpenAI o1-Preview vs. ChatGPT in Healthcare: A New Frontier in Medical AI Reasoning' in the journal Cureus in 2024.
Building Trustworthy NeuroSymbolic AI Systems - arXiv arxiv.org 2 facts
Neuro-Symbolic AI: Explainability, Challenges & Future Trends linkedin.com Dec 15, 2025 1 fact
claimOpenAI and Microsoft have reportedly considered Artificial General Intelligence to be an artificial intelligence system capable of generating $100 billion in profit, according to a 2024 TechCrunch report.
The Synergy of Symbolic and Connectionist AI in LLM ... arxiv.org 1 fact
claimOpenAI’s GPT-4 is an example of a Large Language Model that demonstrates unprecedented capabilities in natural language understanding and generation, exhibiting robust performance across a range of complex tasks.
What Is Open Source Software? - IBM ibm.com 1 fact
claimOpen source LLMs promote a transparent, accessible, and community-driven approach compared to proprietary models like Google's LaMDA and OpenAI's ChatGPT-3 and GPT-4.
vectara/hallucination-leaderboard - GitHub github.com 1 fact
referenceThe Vectara hallucination leaderboard utilizes specific API access points for various large language models: Llama 4 Maverick 17B 128E Instruct FP8 and Llama 4 Scout 17B 16E Instruct are accessed via Together AI; Microsoft Phi-4 and Phi-4-Mini are accessed via Azure; Mistral Ministral 3B, Ministral 8B, Mistral Large, Mistral Medium, and Mistral Small are accessed via Mistral AI's API; Kimi-K2-Instruct-0905 is accessed via Moonshot AI API; GPT-4.1, GPT-4o, GPT-5-High, GPT-5-Mini, GPT-5-Minimal, GPT-5-Nano, o3-Pro, o4-Mini-High, and o4-Mini-Low are accessed via OpenAI API; GPT-OSS-120B, GLM-4.5-AIR-FP8 are accessed via Together AI; Qwen3-4b, Qwen3-8b, Qwen3-14b, Qwen3-32b, and Qwen3-80b-a3b-thinking are accessed via dashscope API; Snowflake-Arctic-Instruct is accessed via Replicate API; Grok-3, Grok-4-Fast-Reasoning, and Grok-4-Fast-Non-Reasoning are accessed via xAI's API; and GLM-4.6 is accessed via deepinfra.
Benchmarking Hallucination-Detection Frameworks - GitHub github.com 1 fact
procedureThe benchmarking_hallucination_detection_frameworks.ipynb notebook in the meshalJcheema/hallucination-benchmark-suite repository orchestrates multiple detection libraries and models, including OpenAI, Groq, UpTrain, and TruthLLM, on the mixed dataset and logs the raw outputs.
10 RAG examples and use cases from real companies - Evidently AI evidentlyai.com Feb 13, 2025 1 fact
procedureTo respond to student questions, ChatLTV provides the LLM with the user query and relevant context retrieved from a vector database, with content chunks served via OpenAI's API.
The role of open source in shaping software thetopvoices.com Nov 12, 2024 1 fact
perspectiveEugene Kublin, an expert software developer and contributor to projects involving OpenAI, Google, Nvidia, Amazon, Intel, Facebook, AMD, and Apache, views the shift toward open source business models—monetizing support, customization, and integration rather than access—as a key driver of software growth over the past few decades.
A Comprehensive Benchmark and Evaluation Framework for Multi ... arxiv.org Jan 6, 2026 1 fact
referenceOpenAI announced GPT-5 in 2025.
What is Open Source Software? - HotWax Systems hotwaxsystems.com Aug 11, 2025 1 fact
claimMistral, Gemma, Falcon, and Command R/R+ serve as open alternatives to commercial APIs such as OpenAI’s GPT and Anthropic’s Claude.
A Knowledge Graph-Based Hallucination Benchmark for Evaluating ... arxiv.org Feb 23, 2026 1 fact
measurementThe evaluation framework included 15 open-source models ranging from 8 billion to 1 trillion parameters, and 10 proprietary models from OpenAI, Google, Anthropic, and xAI.
Knowledge Graphs Enhance LLMs for Contextual Intelligence linkedin.com Mar 10, 2026 1 fact
referenceThe solution guide for integrating Generative AI with graph data combines a Generative AI Agent (such as Google Gemini or OpenAI), a Remote Toolset Service powered by the Model Context Protocol (MCP), and a Neo4j Graph Database containing supply chain data.
What Really Causes Hallucinations in LLMs? - AI Exploration Journey aiexpjourney.substack.com Sep 12, 2025 1 fact
claimOpenAI research suggests that large language models hallucinate because they are rewarded for guessing an answer even when they are uncertain, rather than being trained to state 'I don't know.'
Enterprise AI Requires the Fusion of LLM and Knowledge Graph stardog.com Dec 4, 2024 1 fact
accountSchellaert's team analyzed three major families of modern LLMs: OpenAI's ChatGPT, the LLaMA series developed by Meta, and the BLOOM suite made by BigScience.
Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org Aug 13, 2025 1 fact
referenceThe 'GPT-4 Technical Report' by OpenAI et al. (2024) provides technical documentation and performance details for the GPT-4 large language model, published as an arXiv preprint.
Efficient Knowledge Graph Construction and Retrieval from ... - arXiv arxiv.org Aug 7, 2025 1 fact
procedureThe GraphRAG retrieval process uses a two-stage strategy: first, a high-recall one-hop graph traversal to identify candidate nodes, followed by a dense vector-based re-ranking step using OpenAI embeddings and cosine similarity to refine the results.
Construction of Knowledge Graphs: State and Challenges - arXiv arxiv.org 1 fact
procedureThe VisualSem image cleaning process applies four filters: checking for valid image files, removing duplicated images via SHA1 hashing, using a ResNET-based binary classifier to remove non-photographic images, and leveraging OpenAI’s CLIP to remove images that do not minimally match any of the node glosses.
Leveraging Knowledge Graphs and LLM Reasoning to Identify ... arxiv.org Jul 23, 2025 1 fact
referenceThe experimental evaluation of the LLM agent framework utilized OpenAI’s GPT-4o via Langchain QA chains, interacting with a Neo4j knowledge graph through LLM-generated Cypher queries, with configuration settings of temperature 0.0, top_p 0.95, and a 4096-token limit.
Open Source Licenses: Definition, Types, and Comparison solutionshub.epam.com Feb 3, 2023 1 fact
accountA lawsuit was filed against Microsoft and OpenAI alleging they breached intellectual property laws by utilizing open-source code published on GitHub to construct and train the Copilot service.
Context Graph vs Knowledge Graph: Key Differences for AI - Atlan atlan.com Jan 27, 2026 1 fact
claimThe Model Context Protocol (MCP) is a Linux Foundation project supported by AWS, Anthropic, Google, Microsoft, and OpenAI.