procedure
Factuality-aware Step-wise Policy Optimization (FSPO) is a reinforcement learning fine-tuning algorithm that incorporates explicit factuality verification at each reasoning step and dynamically adjusts token-level advantage values to maintain factual correctness.
Authors
Sources
- EdinburghNLP/awesome-hallucination-detection - GitHub github.com via serper
Referenced by nodes (1)
- reinforcement learning concept