reference
The paper 'Iterative preference learning from human feedback: bridging theory and practice for rlhf under kl-constraint' was published in the International Conference on Machine Learning, pages 54715–54754.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper