Fact — claim — Knowledge Tree

The f-PO framework unifies previous alignment algorithms like DPO (Direct Preference Optimization) and EXO (Expectation-based Optimization) while offering new variants through different choices of f-divergences.

Authors

Person: Samuel Tesfazgi, Leonhard Sprandl, Sandra Hirche Organization: AISTATS
Track: Poster Session 3 - aistats 2026

Sources

Track: Poster Session 3 - aistats 2026 virtual.aistats.org Samuel Tesfazgi, Leonhard Sprandl, Sandra Hirche · AISTATS via serper

Referenced by nodes (1)

Direct Preference Optimization (DPO) concept