gpt-4o-mini
Facts (18)
Sources
Integrating Knowledge Graphs into RAG-Based LLMs to Improve ... thesis.unipd.it 6 facts
measurementThe GPT-4o-Mini model achieved a peak overall accuracy of approximately 76.8%, representing an 8.4% improvement, when using summaries as context in the prompt.
measurementThe GPT-4o-Mini model achieved a peak accuracy of approximately 76.8% (an improvement of 8.4% over baseline) when summaries were included in the prompt.
perspectiveGemini-1.5-Flash prioritizes balanced decision-making in fact-checking tasks, whereas GPT-4o-Mini is more effective at maximizing correct predictions, even if it favors the majority class.
claimGemini-1.5-Flash prioritizes balanced decision-making, whereas GPT-4o-Mini is more effective in maximizing correct predictions, even if it favors the majority class, according to the thesis 'Integrating Knowledge Graphs into RAG-Based LLMs to Improve...'.
perspectiveGemini-1.5-Flash prioritizes balanced decision-making in fact-checking tasks, whereas GPT-4o-Mini is more effective at maximizing correct predictions, even if it favors the majority class.
measurementGPT-4o-Mini reached a peak accuracy of approximately 76.8% when using summaries, representing an 8.4% improvement from the baseline, according to the thesis 'Integrating Knowledge Graphs into RAG-Based LLMs to Improve...'.
Benchmarking Hallucination Detection Methods in RAG - Cleanlab cleanlab.ai Sep 30, 2024 3 facts
claimFor fair comparison in the Cleanlab benchmark, the underlying LLM for all hallucination detection methods is fixed to gpt-4o-mini.
procedureThe Hallucination Metric from the DeepEval package estimates the likelihood of hallucination as the degree to which an LLM response contradicts or disagrees with the provided context, as assessed by an LLM (specifically GPT-4o-mini in the Cleanlab study).
procedureRAGAS++ is a refined variant of the RAGAS technique developed by Cleanlab that uses the gpt-4o-mini LLM for generation and as a critic, replacing the default gpt-3.5-turbo-16k and gpt-4 models.
Medical Hallucination in Foundation Models and Their Impact on ... medrxiv.org Nov 2, 2025 3 facts
measurementThe performance trajectory from gpt-4o-mini to gemini-2.5-pro represents an 81.4% relative improvement in hallucination mitigation.
measurementThe AI model o1 achieves a hallucination resistance baseline of 64.0%, while earlier-generation models gpt-4o and gpt-4o-mini achieve baselines of 54.4% and 48.3% respectively.
measurementOpenAI released GPT-4o in May 2024 and GPT-4o mini in July 2024.
Large Language Models Meet Knowledge Graphs for Question ... arxiv.org Sep 22, 2025 2 facts
referenceCoT-RAG, as described by Li et al. (2025a), utilizes KG-driven CoT generation and knowledge-aware RAG with pseudo-program KGs, employing ERNIE-Speed-128K and GPT-4o-mini models for KGQA and multi-hop QA tasks.
referenceKG-IRAG, as described by Yang et al. (2025), utilizes incremental retrieval and iterative reasoning with Llama-3-8B-Instruct, GPT-3.5-Turbo, GPT-4o-mini, and GPT-4o models on self-constructed knowledge graphs for temporal QA tasks.
Real-Time Evaluation Models for RAG: Who Detects Hallucinations ... cleanlab.ai Apr 7, 2025 1 fact
claimThe Cleanlab RAG benchmark uses OpenAI’s gpt-4o-mini LLM to power both the 'LLM-as-a-judge' and 'TLM' scoring methods.
Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org Aug 13, 2025 1 fact
procedureThe LLM-as-Judge approach for evaluating response correctness leverages GPT-4o-Mini (et al., 2024) to classify generated responses into 'correct,' 'incorrect,' or 'refuse' categories, with 'refuse' treated as a hallucination.
Medical Hallucination in Foundation Models and Their ... medrxiv.org Mar 3, 2025 1 fact
claimOpenAI's GPT-4o-mini model, released in July 2024, is a smaller, cost-effective version of GPT-4o that maintains strong performance with greater efficiency.
Bridging the Gap Between LLMs and Evolving Medical Knowledge arxiv.org Jun 29, 2025 1 fact
claimThe study utilized GPT-4o-mini as the backbone for both the Medical Knowledge Graph (MKG) and AMG-RAG implementations, serving as the core component for reasoning, RAG, and structured knowledge integration.