reference
Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, and Dorsa Sadigh proposed 'Contrastive preference learning' as a method for learning from human feedback without reinforcement learning in a 2024 paper presented at The Twelfth International Conference on Learning Representations.
Authors
Sources
- A Survey of Incorporating Psychological Theories in LLMs - arXiv arxiv.org via serper
Referenced by nodes (1)
- reinforcement learning concept