reference
Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn authored 'Direct preference optimization: Your language model is secretly a reward model', published in the Advances in Neural Information Processing Systems (NeurIPS) in 2023.

Authors

Sources

Referenced by nodes (1)