Fact — reference — Knowledge Tree

Shao et al. (2024) propose a unified paradigm that encompasses Supervised Fine-Tuning (SFT), Rejection Sampling Fine-Tuning (RFT), Direct Preference Optimization (DPO), and Proximal Policy Optimization (PPO), leading to the proposal of Group Relative Policy Optimization (GRPO).

Authors

Person: Not available Organization: arXiv
A Survey on the Theory and Mechanism of Large Language Models

Sources

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv via serper

Referenced by nodes (2)

supervised fine-tuning concept
Direct Preference Optimization (DPO) concept