Relations (1)
related 12.00 — strongly supporting 12 facts
LLaMA is a specific instance of Large Language Models, as explicitly categorized in [1], [2], and [3]. Furthermore, LLaMA models are frequently utilized as subjects in research and benchmarking for Large Language Models, as evidenced by their inclusion in studies regarding model performance, hallucination detection, and adversarial robustness in [4], [5], [6], [7], [8], [9], and [10].
Facts (12)
Sources
Building Trustworthy NeuroSymbolic AI Systems - arXiv arxiv.org 1 fact
referenceRetrieval-Augmented Generation (RAG) Language Models, including REALM (Guu et al. 2020), LAMA (Petroni et al. 2019), ISEEQ (Gaur et al. 2022), and RAG (Lewis et al. 2020), integrate a generator with a dense passage retriever and access to indexed data sources to add a layer of supervision to model outputs.
A Survey on the Theory and Mechanism of Large Language Models arxiv.org 1 fact
claimLarge Language Models such as ChatGPT (OpenAI, 2022), DeepSeek (Guo et al., 2025), Qwen (Bai et al., 2023a), Llama (Touvron et al., 2023), Gemini (Team et al., 2023), and Claude (Caruccio et al., 2024) have transcended the boundaries of traditional Natural Language Processing as established by Vaswani et al. (2017a).
What is Open Source Software? - HotWax Systems hotwaxsystems.com 1 fact
referenceOllama is a streamlined interface designed to run Large Language Models (LLMs) such as LLaMA, Gemma, or Mistral on personal machines.
EdinburghNLP/awesome-hallucination-detection - GitHub github.com 1 fact
procedureThe BAFH framework is a lightweight method that trains a feedforward classifier on hidden states of Large Language Models to determine belief states and classify hallucination types, as evaluated against MIND and SAR baselines using Gemma-2, Llama-3.1, and Mistral models.
The Synergy of Symbolic and Connectionist AI in LLM-Empowered ... arxiv.org 1 fact
claimLarge Language Models (LLMs) are transformer-based language models, including OpenAI’s GPT-4, Google’s Gemini and PaLM, Microsoft’s Phi-3, and Meta’s LLaMA.
MedHallu - GitHub github.com 1 fact
measurementState-of-the-art Large Language Models, including GPT-4o, Llama-3.1, and UltraMedical, struggle with hard hallucination categories in the MedHallu benchmark, achieving a best F1 score of 0.625.
Medical Hallucination in Foundation Models and Their ... medrxiv.org 1 fact
claimPretrained Large Language Models such as GPT-3, GPT-4, PaLM, LLaMA, and BERT have demonstrated advancements due to the extensive datasets used in their training.
The construction and refined extraction techniques of knowledge ... nature.com 1 fact
claimLarge-scale pre-trained Large Language Models (LLMs) such as GPT-4 and LLaMA-3 utilize large-scale pretraining and task-specific fine-tuning to achieve cross-task generalization.
Construction of intelligent decision support systems through ... - Nature nature.com 1 fact
claimLarge language models such as Mistral 7B and LLaMA-2 often struggle with contextual understanding, transparency, and multi-step reasoning across multiple domains.
Track: Poster Session 3 - aistats 2026 virtual.aistats.org 1 fact
claimAdversarial attacks on Large Language Models (LLMs) for time series forecasting lead to more severe performance degradation than random noise across models including LLMTime with GPT-3.5, GPT-4, LLaMa, Mistral, TimeGPT, and TimeLLM.
A Knowledge Graph-Based Hallucination Benchmark for Evaluating ... arxiv.org 1 fact
referenceThe paper 'The llama 3 herd of models' documents the Llama 3 family of large language models.
A Comprehensive Benchmark for Detecting Medical Hallucinations ... aclanthology.org 1 fact
measurementState-of-the-art large language models, including GPT-4o, Llama-3.1, and the medically fine-tuned UltraMedical, struggle with the binary hallucination detection task in MedHallu, with the best model achieving an F1 score as low as 0.625 for detecting 'hard' category hallucinations.