claim
The f-PO framework unifies previous alignment algorithms like DPO (Direct Preference Optimization) and EXO (Expectation-based Optimization) while offering new variants through different choices of f-divergences.
Authors
Sources
- Track: Poster Session 3 - aistats 2026 virtual.aistats.org via serper