concept

Zero-Shot

Facts (13)

Sources

Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org arXiv Aug 13, 2025 5 facts

claimFor the Llama model, the performance discrepancy between ROUGE and LLM-as-Judge evaluation narrows significantly when using few-shot examples compared to zero-shot settings.

claimAlternative metrics such as BERTScore, BLEU, and UniEval-fact exhibit substantial shortcomings in reliably detecting hallucinations in question-answering tasks, particularly under zero-shot conditions.

measurementThe Mistral model exhibits pronounced performance degradation in zero-shot settings, with performance drops observed in Perplexity metrics, whereas the Llama model maintains more consistent performance with minimal degradation.

claimFew-shot settings consistently yield more reliable evaluations across metrics compared to zero-shot settings.

claimSemantic Entropy maintains the most consistent performance across both zero-shot and few-shot settings, while traditional metrics like Perplexity and LN-Entropy show higher sensitivity to setting changes.

Survey and analysis of hallucinations in large language models frontiersin.org Frontiers Sep 29, 2025 3 facts

claimHallucination scores for language models change little across prompting techniques such as Zero-shot, Few-shot, CoT, and Instruction formats because the prompts are semantically equivalent and decoding is low-entropy, causing outputs to be dominated by the models' learned alignment policies.

procedureThe experimental pipeline evaluates hallucinations in open-source LLMs by integrating benchmark datasets, varied prompt strategies (zero-shot, few-shot, CoT), and text generation via HuggingFace.

procedureThe prompt engineering protocol used in the study involves five categories: Zero-shot (basic instruction), Few-shot (2-3 input-output examples), Instruction (structured natural language), Chain-of-thought (step-by-step reasoning), and Vague/misleading (intentionally unclear).

Grounding LLM Reasoning with Knowledge Graphs - arXiv arxiv.org arXiv Dec 4, 2025 1 fact

referenceThe baseline methods used for comparison include: (1) Zero-Shot (querying the model without additional context), (2) Text RAG (using text representation of nodes as input), (3) Graph RAG (including 1-hop node neighbors), and (4) Graph CoT (Agent) (implementing Graph CoT as an agent for reasoning).

LLM Hallucination Detection and Mitigation: State of the Art in 2026 zylos.ai Zylos Jan 27, 2026 1 fact

measurementChain-of-Verification (CoVe) improves F1 scores by 23% (from 0.39 to 0.48) and outperforms Zero-Shot, Few-Shot, and Chain-of-Thought methods, though it does not eliminate hallucinations in complex reasoning chains.

Medical Hallucination in Foundation Models and Their ... medrxiv.org medRxiv Mar 3, 2025 1 fact

procedureThe 'Base' method for evaluating Large Language Models involves querying the models directly with questions from the Med-HALT benchmark without additional context or instructions to assess inherent hallucination tendencies in a zero-shot setting.

Track: Poster Session 3 - aistats 2026 virtual.aistats.org Samuel Tesfazgi, Leonhard Sprandl, Sandra Hirche · AISTATS 1 fact

formulaIn the zero-shot case, where no regression data is available, task shift from classification to regression is impossible in both sparse signal and random signal models for any Gaussian covariate distribution.

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv Mar 12, 2026 1 fact

referenceThe paper 'Revisiting chain-of-thought prompting: zero-shot can be stronger than few-shot' is an arXiv preprint, identified as arXiv:2506.14641.