reinforcement learning ↔ Large Language Models

Relations (1)

related 13.00 — strongly supporting 13 facts

Large Language Models and reinforcement learning are intrinsically linked as reinforcement learning is a primary technique used to fine-tune and incentivize reasoning capabilities in these models, as evidenced by [1], [2], and [3]. Furthermore, research explores how reinforcement learning elicits latent reasoning behaviors [4], [5] and addresses challenges like hallucinations [6] or world representation probing [7] within Large Language Models.

Facts (13)

Sources

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv 7 facts

referenceThe paper 'Deepseek-r1: incentivizing reasoning capability in llms via reinforcement learning' (arXiv:2501.12948) is cited in the survey 'A Survey on the Theory and Mechanism of Large Language Models' regarding reasoning capabilities.

referenceThe research paper 'ProRL: prolonged reinforcement learning expands reasoning boundaries in large language models' was published in the International Conference on Machine Learning, pp. 4051–4060, and cited in section 7.2.2 of the survey.

claimA central debate in the theoretical community concerns whether Reinforcement Learning (RL) truly instills new reasoning capabilities in Large Language Models or merely elicits latent abilities acquired during pre-training.

claimLiu et al. (2025d) demonstrated that with sufficient training duration and periodic policy resets, Reinforcement Learning can drive Large Language Models to explore novel strategies absent in the base model, thereby expanding the reasoning boundary.

claimThe research paper 'All roads lead to likelihood: the value of reinforcement learning in fine-tuning' (arXiv:2503.01067) analyzes the role and value of reinforcement learning in the fine-tuning process of large language models.

claimShao et al. (2025) found that even weak or random reward signals can significantly improve mathematical reasoning in Large Language Models because Reinforcement Learning activates valid reasoning modes, such as code-based reasoning, already present in the pre-trained model.

referenceThe paper 'Detecting data contamination from reinforcement learning post-training for large language models' is an arXiv preprint, arXiv:2510.09259.

Unlocking the Potential of Generative AI through Neuro-Symbolic ... arxiv.org arXiv 1 fact

referenceThe paper 'Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning' was published as an arXiv preprint in 2025.

LLM Hallucinations: Causes, Consequences, Prevention - LLMs llmmodels.org llmmodels.org 1 fact

claimReinforcement learning is an emerging technique to solve LLM hallucinations by training large language models using a reward function that penalizes hallucinated outputs.

Detecting hallucinations with LLM-as-a-judge: Prompt ... - Datadog datadoghq.com Aritra Biswas, Noé Vernier · Datadog 1 fact

claimLarge-scale reinforcement learning in Large Language Models elicits reasoning behaviors such as hypothesis generation and self-criticism as emergent properties.

A Comprehensive Benchmark and Evaluation Framework for Multi ... arxiv.org arXiv 1 fact

referenceDeepSeek-AI published the DeepSeek-R1 technical report in 2025, detailing the use of reinforcement learning to incentivize reasoning capabilities in large language models.

Knowledge graphs - Amazon Science amazon.science Amazon Science 1 fact

claimThe Amazon research lab combines large language models (LLMs) with reinforcement learning (RL) to solve reasoning, planning, and world modeling in both virtual and physical environments.

Do LLMs Build World Representations? Probing Through the Lens of... openreview.net OpenReview 1 fact

claimThe authors of "Do LLMs Build World Representations? Probing Through the Lens ..." propose a framework for probing world representations in Large Language Models using state abstraction theory from reinforcement learning, which distinguishes between general abstractions that facilitate predicting future states and goal-oriented abstractions that guide actions to accomplish tasks.