concept

generalization

Facts (33)

Sources

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv Mar 12, 2026 15 facts

claimVasudeva et al. (2025) showed through theoretical and empirical results that SGD exhibits a simplicity bias leading to weaker generalization under data distribution changes, while Adam is more resistant to this bias and more robust under distribution shifts.

referenceThe paper 'Generalization v.s. memorization: tracing language models’ capabilities back to pretraining data' investigates the relationship between memorization and generalization in language models.

claimSwamy et al. (2025) attributed the superiority of Reinforcement Learning (RL) in generalization to the 'generation-verification gap,' arguing that in many reasoning tasks, learning a verifier is significantly easier than learning a generator.

referenceDehghani et al. (2018) introduced the Universal Transformer, which improves generalization by sharing parameters across layers and allowing the model to flexibly adjust its iterative depth.

claimWang et al. (2025d) found that factual question answering tasks demonstrate the strongest memorization effect, which increases with model size, whereas tasks like machine translation and reasoning exhibit greater generalization.

claimJiang et al. (2025) introduce a differentiable adaptation matrix (DAM) to dynamically select modules for LoRA adaptation, theoretically proving that this selective approach enhances convergence speed and generalization.

claimMuon and Spectral Descent (Bernstein and Newhouse, 2025; 2024) exhibit an implicit bias toward solutions maximizing margins under the spectral norm, which offers potential generalization benefits.

referenceThe paper 'Towards a theoretical understanding to the generalization of rlhf' is available as arXiv preprint arXiv:2601.16403.

referenceThe paper 'Transformers as algorithms: generalization and stability in in-context learning' is available as arXiv preprint arXiv:2301.07067.

referenceThe paper 'Discrepancies are virtue: weak-to-strong generalization through lens of intrinsic dimension' was published in the Forty-second International Conference on Machine Learning.

referenceThe paper 'On the generalization ability of unsupervised pretraining' was published in the International Conference on Artificial Intelligence and Statistics, pp. 4519–4527.

referenceThe paper 'Training nonlinear transformers for chain-of-thought inference: a theoretical generalization analysis' provides a theoretical analysis of how nonlinear transformers generalize when trained for chain-of-thought inference.

claimLi and Flanigan (2024) found that a model's superior performance in zero- or few-shot settings may stem from exposure to task-related samples during pre-training rather than genuine generalization.

claimChu et al. (2025) provided empirical evidence that Supervised Fine-Tuning (SFT) tends to memorize training data, leading to poor performance on out-of-distribution (OOD) tasks, whereas Reinforcement Learning (RL) demonstrates superior generalization capabilities.

referenceThe paper 'Debate helps weak-to-strong generalization' investigates the role of debate in improving the generalization capabilities of models from weak to strong performance.

Unlocking the Potential of Generative AI through Neuro-Symbolic ... arxiv.org arXiv Feb 16, 2025 7 facts

procedureThe study evaluates Neuro Symbolic Neuro architectures against criteria including generalization, scalability, data efficiency, reasoning, robustness, transferability, and interpretability.

claimThe 'Neuro → Symbolic ← Neuro' model consistently outperforms other neuro-symbolic architectures across all evaluation metrics, including generalization, reasoning capabilities, transferability, and interpretability.

claimGeneralization in Neuro-Symbolic AI (NSAI) architectures is evaluated based on out-of-distribution (OOD) performance, which is the ability to maintain performance on data that deviates from the training distribution, and contextual flexibility, which is the capacity to adapt to changes in context or domain with minimal retraining.

claimNeuro-Symbolic AI (NSAI) systems aim to provide enhanced generalization, interpretability, and robustness by combining the adaptability of neural networks with the explicit reasoning capabilities of symbolic methods.

claimGeneralization in Neuro-Symbolic AI (NSAI) architectures is defined as the capability of a model to extend learned representations beyond the training dataset to perform effectively in novel or unforeseen situations.

claimNeuro Symbolic Neuro is identified as the most balanced and robust solution among the architectures investigated, demonstrating superior performance in generalization, scalability, and interpretability.

claimNeuro-symbolic artificial intelligence (NSAI) aims to enhance generalization, reasoning, and scalability in AI systems while addressing challenges related to transparency and data efficiency.

Practices, opportunities and challenges in the fusion of knowledge ... frontiersin.org Frontiers 2 facts

claimKnowledge-Driven Fine-Tuning is a research approach that incorporates structured knowledge from knowledge graphs during large language model (LLM) adaptation to improve generalization and knowledge-awareness.

claimKG-Agent (Jiang J. et al., 2024) relies on predefined rules, which results in limited generalization and high maintenance costs.

A Comprehensive Review of Neuro-symbolic AI for Robustness ... link.springer.com Springer Dec 9, 2025 2 facts

referenceLake, B. and Baroni, M. analyzed the compositional skills of sequence-to-sequence recurrent networks, concluding they exhibit generalization without systematicity.

claimRobustness in AI models is defined as the ability to maintain stable and reliable performance when subjected to varied and unexpected conditions, extending beyond training data accuracy to include generalization across real-world scenarios.

Understanding LLM Understanding skywritingspress.ca Skywritings Press Jun 14, 2024 2 facts

referenceBengio authored the work titled 'Conscious processing, inductive biases and generalization in deep learning,' which examines the intersection of conscious processing, inductive biases, and generalization in deep learning systems.

claimThe Tolerance Principle of learning by satisficing provides a precise and parameter-free measure of what constitutes sufficient generalization.

On Hallucinations in Artificial Intelligence–Generated Content ... jnm.snmjournals.org The Journal of Nuclear Medicine 1 fact

claimTransfer learning, which involves leveraging publicly pretrained models and fine-tuning them on local data, is an effective strategy for balancing generalization and specialization to mitigate hallucinations.

Rationalism Vs. Empiricism 101: Which One is Right? - TheCollector thecollector.com The Collector Nov 9, 2023 1 fact

claimEmpiricism struggles to explain complex thought processes such as synthesis, abstraction, generalization, specification, deduction, and induction, which are necessary for understanding abstract concepts in mathematics, natural sciences, and social disciplines.

The construction and refined extraction techniques of knowledge ... nature.com Nature Feb 10, 2026 1 fact

claimThe study 'The construction and refined extraction techniques of knowledge' claims that refined task planning and data partitioning reduce multi-task training complexity and improve model training efficiency and generalization.

LLM-empowered knowledge graph construction: A survey - arXiv arxiv.org arXiv Oct 23, 2025 1 fact

claimTraditional Knowledge Extraction methods are constrained by data scarcity, weak generalization, and cumulative error propagation.

Track: Poster Session 3 - aistats 2026 virtual.aistats.org Samuel Tesfazgi, Leonhard Sprandl, Sandra Hirche · AISTATS 1 fact

claimThere has been significant recent interest in understanding the implicit bias of gradient descent optimization and its connection to the generalization properties of overparametrized neural networks.