In-Context Learning
Also known as: ICL
Facts (70)
Sources
A Survey on the Theory and Mechanism of Large Language Models arxiv.org Mar 12, 2026 43 facts
claimIn-context learning is a phenomenon where a large language model appears to learn new tasks at inference time without requiring gradient updates.
claimXing et al. (2024) demonstrated that positional encoding and multi-head attention improve the predictive performance of in-context learning (ICL) when applied to linear regression tasks using unstructured data.
claimXie et al. (2021) created the Generative IN-Context learning dataset (GINC), a small-scale synthetic dataset used to study the mechanism of in-context learning.
claimTransformer-based Large Language Models demonstrate in-context learning capabilities, as established by Vaswani et al. (2017b), Brown et al. (2020), Wei et al. (2022b), Dong et al. (2022), and Liu et al. (2023b).
referenceWang et al. (2023) identified that in input-label pairs during in-context learning (ICL), label tokens act as anchors where semantic information from the context aggregates at the shallower layers of large language models, and final predictions reference this aggregated information.
claimOlsson et al. (2022) identified induction heads as specific attention heads whose learned algorithm underlies a large fraction of in-context learning in Large Language Models.
measurementWhen a problem becomes sparse, the prediction error of in-context learning (ICL) in Transformers is comparable to the solution of the Lasso problem, according to Garg et al. (2022).
claimResearchers (2025a) analyzed the optimization dynamics of a single-layer Transformer with normalized ReLU self-attention under in-context learning (ICL) mechanisms, finding that smaller eigenvalues of attention weights preserve basic knowledge, while larger eigenvalues capture specialized knowledge.
claimThe 'Empirical Camp' in large language model research investigates the characteristics of the in-context learning (ICL) process through experiments rather than theory to provide empirical insights for theoretical analysis.
referenceThe paper 'Large language models are latent variable models: explaining and finding good demonstrations for in-context learning' posits that large language models function as latent variable models.
referenceWei et al. (2023) conducted an exploration of in-context learning (ICL) using GPT-3, InstructGPT, Codex, PaLM, and Flan-PaLM across different configurations.
claimAnchor-based re-weighting methods, demonstration compression techniques, and diagnostic analysis frameworks for In-Context Learning (ICL) errors have been proposed and validated to improve ICL performance.
perspectiveThe 'Representation Camp' perspective posits that Large Language Models (LLMs) store memories about various topics during pretraining, and in-context learning retrieves contextually relevant topics during inference based on demonstrations.
claimCurrent literature on Large Language Models identifies several unpredictable behaviors at scale, including In-Context Learning (Brown et al., 2020), complex hallucinations (Xu et al., 2024b), and 'aha moments' observed during training (Guo et al., 2025).
perspectiveCheng et al. (2025a) argue that In-Context Learning (ICL) does not benefit reasoning models that utilize long chains-of-thought, suggesting that research on ICL requires multidimensional perspectives.
referenceThe paper 'Training dynamics of multi-head softmax attention for in-context learning: emergence, convergence, and optimality' was published in The Thirty Seventh Annual Conference on Learning Theory, page 4573.
referenceThe paper 'What can transformers learn in-context? a case study of simple function classes' was published in Advances in Neural Information Processing Systems 35, pp. 30583β30598.
referenceThe paper 'Why can gpt learn in-context? language models secretly perform gradient descent as meta optimizers' is an arXiv preprint (arXiv:2212.10559).
claimXie et al. (2021) demonstrated that the asymptotic prediction error for in-context learning achieves optimality even with a distribution mismatch, provided the signal pertaining to the latent concept in each prompt example surpasses the error arising from the distribution mismatch.
referenceThe paper 'Can looped transformers learn to implement multi-step gradient descent for in-context learning?' is an arXiv preprint, identified as arXiv:2410.08292.
claimWei et al. (2023) observed that smaller large language models primarily rely on semantic priors from pretraining during in-context learning (ICL) and often disregard label flips in the context, whereas larger models demonstrate the capability to override these priors when faced with label flips.
claimLarge Language Models exhibit emergent phenomena not found in smaller models, including hallucination, in-context learning (ICL), scaling laws, and sudden 'aha moments' during training.
referenceThe paper 'Transformers as algorithms: generalization and stability in in-context learning' is available as arXiv preprint arXiv:2301.07067.
claimWei et al. (2023) found that sufficiently large language models can perform linear classification tasks even when the in-context learning (ICL) setting involves semantically unrelated labels.
claimChen et al. (2024e) used gradient flow to analyze how a simplified Transformer architecture with two attention layers performs in-context learning, revealing the collaborative mechanism of its components.
referenceThe paper 'In-context learning with transformers: softmax attention adapts to function lipschitzness' is an arXiv preprint (arXiv:2402.11639) regarding in-context learning.
claimThe accuracy of in-context learning (ICL) in large language models depends on the independent specification of input and label spaces, the distribution of the input text, and the format of the input-output pairs.
referenceThe paper 'Larger language models do in-context learning differently' (arXiv:2303.03846) compares in-context learning behaviors across different model sizes.
claimLarge language models do not learn new tasks during in-context learning (ICL); instead, they use demonstration information to locate tasks or topics, while the ability to perform tasks is learned during pretraining.
referenceThe paper 'CausalLM is not optimal for in-context learning' is an arXiv preprint, identified as arXiv:2308.06912.
referenceGatmiry et al. (2024) studied whether looped Transformers can implement multi-step gradient descent in an in-context learning setting.
referenceThe paper 'Label words are anchors: an information flow perspective for understanding in-context learning' (arXiv:2305.14160) examines in-context learning through the lens of information flow.
claimMin et al. (2022a) found that replacing labels in input-label pairs with random ones during in-context learning inference results in only marginal decreases in performance across 12 models, including GPT-3, which contrasts with findings by Xie et al. (2021).
referenceThe paper 'Selective induction heads: how transformers select causal structures in context' was presented at The Thirteenth International Conference on Learning Representations.
claimZhang et al. (2024b) studied the training dynamics of a Transformer with a single linear attention layer during in-context learning for linear regression tasks and showed that the model can find the global minimum of the objective function.
referenceThe paper 'Transformers learn in-context by gradient descent' was published in the International Conference on Machine Learning, pp. 35151β35174.
claimTransformers and LSTMs both possess the ability to learn in-context, and this capability improves with the length and quantity of demonstrations.
claimChen et al. (2024d) proved that Transformer training dynamics consist of three distinct phases: warm-up, emergence, and convergence, with in-context learning capabilities rapidly emerging during the emergence phase.
referenceThe paper 'Transformers implement functional gradient descent to learn non-linear functions in context' is an arXiv preprint, identified as arXiv:2312.06528.
claimZheng et al. (2024) demonstrated that autoregressively trained Transformers can implement in-context learning by learning a meta-optimizer, specifically learning to perform one-step gradient descent to solve ordinary least squares (OLS) problems under specific initial data distribution conditions.
claimIn-context learning is a form of few-shot learning where a model is provided with a small number of input-label pairs as examples, allowing the model to recognize a task and provide an answer for a query without parameter updates.
referenceThe paper 'A survey for in-context learning' is an arXiv preprint, identified as arXiv:2301.00234.
claimArora et al. (2024) demonstrated that a recurrent model's ability to recall information is sensitive to the order of input presentation by reducing the in-context learning problem to a set-disjointness task.
Track: Poster Session 3 - aistats 2026 virtual.aistats.org 7 facts
claimIn-Context Learning (ICL) allows Large Language Models (LLMs) to complete tasks using examples provided in a prompt without tuning model parameters.
claimYingqian Cui, Jie Ren, Pengfei He, Hui Liu, Jiliang Tang, and Yue Xing present a theoretical analysis comparing the exact convergence of single-head and multi-head attention in transformers for in-context learning with linear regression tasks.
formulaThe In-Context Learning (ICL) average error of pretrained Large Language Models (LLMs) is the sum of O(T^-1) and the pretraining error.
formulaWhen the number of in-context examples D increases, the prediction loss for both single-head and multi-head attention in transformers is in O(1/D), but the prediction loss for multi-head attention has a smaller multiplicative constant.
claimPerfectly pretrained Large Language Models (LLMs) perform Bayesian Model Averaging (BMA) for In-Context Learning (ICL) under a dynamic model of examples in the prompt.
claimYingqian Cui et al. demonstrate that multi-head attention with a substantial embedding dimension performs better than single-head attention in in-context learning tasks.
claimAttention structures in Large Language Models (LLMs) boost Bayesian Model Averaging (BMA) implementation, and with sufficient examples in the prompt, attention performs BMA under the Gaussian linear In-Context Learning (ICL) model.
The Synergy of Symbolic and Connectionist AI in LLM-Empowered ... arxiv.org Jul 11, 2024 5 facts
referenceSewon Min et al. investigated the mechanisms behind in-context learning and the role of demonstrations in large language models.
referenceSang Michael Xie et al. proposed an explanation of in-context learning in large language models as a form of implicit Bayesian inference.
claimOnce trained, large language models can be fine-tuned with additional data at a lower cost and effort compared to updating Knowledge Graphs, and they can support in-context learning without requiring fine-tuning.
claimLLM-empowered Autonomous Agents (LAAs) offer unique advantages over Knowledge Graphs (KGs) by mimicking human-like reasoning processes, scaling effectively with large datasets, and leveraging in-context learning without extensive re-training.
claimPromising future directions for neuro-symbolic AI include neuro-vector-symbolic architectures, which incorporate vector manipulation to enhance agentic reasoning capabilities, and generative encoding, which embeds agentic logical steps into text vectorization for advanced sample selection for in-context learning of LLM-empowered agents.
Unlocking the Potential of Generative AI through Neuro-Symbolic ... arxiv.org Feb 16, 2025 3 facts
claimIn-context learning distillation combines in-context learning objectives with traditional language modeling, allowing smaller models to perform effectively with limited data while maintaining computational efficiency.
claimThe Symbolic[Neuro] approach utilizes neural networks for context-aware predictions, such as in-context learning, few-shot learning, and Chain-of-Thought (CoT) reasoning, while employing symbolic systems to facilitate higher-order reasoning.
referenceHanlin Zhang, YiFan Zhang, Li Erran Li, and Eric Xing authored 'The impact of symbolic representations on in-context learning for few-shot reasoning', which was presented at the NeurIPS 2022 Workshop on Neuro Causal and Symbolic AI (nCSI).
The Hallucinations Leaderboard, an Open Effort to Measure ... huggingface.co Jan 29, 2024 2 facts
claimThe Hallucinations Leaderboard is a platform designed to evaluate large language models against benchmarks specifically created to assess hallucination-related issues using in-context learning.
procedureThe Hallucinations Leaderboard utilizes the EleutherAI Language Model Evaluation Harness to perform zero-shot and few-shot evaluations of large language models via in-context learning.
Large Language Models Meet Knowledge Graphs for Question ... arxiv.org Sep 22, 2025 1 fact
referenceStraGo, proposed by Wu et al. (2024c), enhances the quality and stability of prompts by using in-context learning to apply insights and strategic guidance learned from historical prompts.
Knowledge Graph-extended Retrieval Augmented Generation for ... arxiv.org Apr 11, 2025 1 fact
procedureKG-RAG utilizes In-Context Learning (ICL) and Chain-of-Thought (CoT) prompting to generate explicit reasoning chains that are processed separately to improve truthfulness.
The Synergy of Symbolic and Connectionist AI in LLM ... arxiv.org 1 fact
procedureAgentic reasoning approaches handle ambiguous user requests through a multi-step process: retrieving similar cases, enhancing actions via in-context learning, prompting the LLM to clarify and rewrite the request, extracting vectors for each rewritten request, and performing multi-vector retrieval.
Detecting and Evaluating Medical Hallucinations in Large Vision ... arxiv.org Jun 14, 2024 1 fact
procedureThe method for constructing counterfactual question-answer pairs involves: (1) Leveraging GPT-4βs in-context learning capabilities to generate pairs without using image information by incorporating original questions and correct answers into instructions; (2) Reviewing and verifying the generated pairs by human annotators to ensure accuracy.
Combining Knowledge Graphs and Large Language Models - arXiv arxiv.org Jul 9, 2024 1 fact
referenceLee et al. demonstrated that LLMs can learn patterns from historical data in Temporal Knowledge Graphs using in-context learning (ICL) without requiring special architectures or modules.
Building Trustworthy NeuroSymbolic AI Systems - arXiv arxiv.org 1 fact
referenceMeade et al. (2023) proposed using in-context learning to improve dialogue safety in language models.
Applying Large Language Models in Knowledge Graph-based ... arxiv.org Jan 7, 2025 1 fact
referenceQ. Wang, Z. Gao, and R. Xu published 'Exploring the in-context learning ability of large language model for biomedical concept linking' as an arXiv e-print.
Combining large language models with enterprise knowledge graphs frontiersin.org Aug 26, 2024 1 fact
claimIn-context learning offers greater flexibility for adapting to the rapidly evolving field of Large Language Models (LLMs), though prompt engineering is time-consuming and requires methods that are not universally applicable across models, as reported by Zhao et al. (2024).
Practices, opportunities and challenges in the fusion of knowledge ... frontiersin.org 1 fact
referenceSPINACH, introduced by Liu S. et al. in 2024, provides an expert-annotated knowledge base question answering (KBQA) dataset with in-context learning that outperforms GPT-4 on complex queries.
KG-IRAG: A Knowledge Graph-Based Iterative Retrieval-Augmented ... arxiv.org Mar 18, 2025 1 fact
referenceDong-Ho Lee, Kian Ahrabian, Woojeong Jin, Fred Morstatter, and Jay Pujara authored 'Temporal knowledge graph forecasting without knowledge using in-context learning', published as an arXiv preprint (arXiv:2305.10613).