reference
Direct preference optimization (DPO), introduced by Rafailov et al. in 2024, is a method used to align model outputs and behaviors with human preferences.

Authors

Sources

Referenced by nodes (1)