Fact — reference — Knowledge Tree

Direct preference optimization (DPO), introduced by Rafailov et al. in 2024, is a method used to align model outputs and behaviors with human preferences.

Authors

Person: Not available Organization: medRxiv
Medical Hallucination in Foundation Models and Their ...

Sources

Medical Hallucination in Foundation Models and Their ... www.medrxiv.org medRxiv via serper

Referenced by nodes (1)

Direct Preference Optimization (DPO) concept