reference
The paper 'Rl on incorrect synthetic data scales the efficiency of llm math reasoning by eight-fold' was published in Advances in Neural Information Processing Systems 37, pp. 43000–43031, and is cited in 'A Survey on the Theory and Mechanism of Large Language Models'.

Authors

Sources

Referenced by nodes (1)