reference
RL4HS is a reinforcement-learning framework for span-level hallucination detection that couples chain-of-thought reasoning with span-level rewards, utilizing Group Relative Policy Optimization (GRPO) and Class-Aware Policy Optimization (CAPO) to address reward imbalance between hallucinated and non-hallucinated spans.

Authors

Sources

Referenced by nodes (3)