Relations (1)

related 3.46 — strongly supporting 10 facts

GPT-4 and LLaMA are both categorized as Large Language Models (LLMs) that utilize transformer architectures and large-scale pretraining [1], [2], [3]. They are frequently compared in research regarding their performance on benchmarks, instruction adherence, and hallucination detection [4], [5], [6], [7].

Facts (10)

Sources
The construction and refined extraction techniques of knowledge ... nature.com Nature 2 facts
claimLarge-scale language models such as GPT-4, LLaMA, and PaLM are key enablers of automated knowledge graph construction due to their strong semantic understanding and reasoning capabilities.
claimLarge-scale pre-trained Large Language Models (LLMs) such as GPT-4 and LLaMA-3 utilize large-scale pretraining and task-specific fine-tuning to achieve cross-task generalization.
EdinburghNLP/awesome-hallucination-detection - GitHub github.com GitHub 1 fact
measurementAccording to AnyScale, Llama 2 is approximately as factually accurate as GPT-4 for summaries and is 30 times cheaper to operate.
The Synergy of Symbolic and Connectionist AI in LLM-Empowered ... arxiv.org arXiv 1 fact
claimLarge Language Models (LLMs) are transformer-based language models, including OpenAI’s GPT-4, Google’s Gemini and PaLM, Microsoft’s Phi-3, and Meta’s LLaMA.
MedHallu - GitHub github.com GitHub 1 fact
measurementState-of-the-art Large Language Models, including GPT-4o, Llama-3.1, and UltraMedical, struggle with hard hallucination categories in the MedHallu benchmark, achieving a best F1 score of 0.625.
Medical Hallucination in Foundation Models and Their ... medrxiv.org medRxiv 1 fact
claimPretrained Large Language Models such as GPT-3, GPT-4, PaLM, LLaMA, and BERT have demonstrated advancements due to the extensive datasets used in their training.
Knowledge Graphs Enhance LLMs for Contextual Intelligence linkedin.com LinkedIn 1 fact
procedureThe author's 'SKILL.md' file contains hard-coded logic that forces AI models, including Claude, GPT-4o, and local Llama 3 instances, to follow a deterministic path for entity extraction.
Track: Poster Session 3 - aistats 2026 virtual.aistats.org Samuel Tesfazgi, Leonhard Sprandl, Sandra Hirche · AISTATS 1 fact
claimAdversarial attacks on Large Language Models (LLMs) for time series forecasting lead to more severe performance degradation than random noise across models including LLMTime with GPT-3.5, GPT-4, LLaMa, Mistral, TimeGPT, and TimeLLM.
Building Trustworthy NeuroSymbolic AI Systems - arXiv arxiv.org arXiv 1 fact
claimGPT-3.5, Claude, and GPT-4.0 adhere more closely to instructions than LLama2 (Touvron et al. 2023), Vicuna (Chiang et al. 2023), and Falcon (Penedo et al. 2023).
A Comprehensive Benchmark for Detecting Medical Hallucinations ... aclanthology.org Shrey Pandit, Jiawei Xu, Junyuan Hong, Zhangyang Wang, Tianlong Chen, Kaidi Xu, Ying Ding · ACL Anthology 1 fact
measurementState-of-the-art large language models, including GPT-4o, Llama-3.1, and the medically fine-tuned UltraMedical, struggle with the binary hallucination detection task in MedHallu, with the best model achieving an F1 score as low as 0.625 for detecting 'hard' category hallucinations.