claim
The f-PO framework unifies previous alignment algorithms like DPO (Direct Preference Optimization) and EXO (Expectation-based Optimization) while offering new variants through different choices of f-divergences.

Authors

Sources

Referenced by nodes (1)