procedure
Factuality-aware Step-wise Policy Optimization (FSPO) is a reinforcement learning fine-tuning algorithm that incorporates explicit factuality verification at each reasoning step and dynamically adjusts token-level advantage values to maintain factual correctness.

Authors

Sources

Referenced by nodes (1)