reference
Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, and Dorsa Sadigh proposed 'Contrastive preference learning' as a method for learning from human feedback without reinforcement learning in a 2024 paper presented at The Twelfth International Conference on Learning Representations.

Authors

Sources

Referenced by nodes (1)