reference
Direct preference optimization (DPO), introduced by Rafailov et al. in 2024, is a method used to align model outputs and behaviors with human preferences.
Authors
Sources
- Medical Hallucination in Foundation Models and Their ... www.medrxiv.org via serper