Fact — claim — Knowledge Tree

Zhu et al. (2025c) proved that Reinforcement Learning updates occur in low-curvature subspaces orthogonal to the principal components updated by Supervised Fine-Tuning (SFT), suggesting that Reinforcement Learning operates in a distinct optimization regime that fine-tunes behavior without significantly altering primary feature representations.

Authors

Person: Not available Organization: arXiv
A Survey on the Theory and Mechanism of Large Language Models

Sources

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv via serper

Referenced by nodes (2)

reinforcement learning concept
supervised fine-tuning concept