measurement
Setlur et al. (2024) found that in mathematical reasoning tasks, using reinforcement learning on a model's incorrect responses is twice as sample-efficient as fine-tuning on correct synthetic answers.

Authors

Sources

Referenced by nodes (2)