reference
RL4HS is a reinforcement-learning framework for span-level hallucination detection that couples chain-of-thought reasoning with span-level rewards, utilizing Group Relative Policy Optimization (GRPO) and Class-Aware Policy Optimization (CAPO) to address reward imbalance between hallucinated and non-hallucinated spans.
Authors
Sources
- EdinburghNLP/awesome-hallucination-detection - GitHub github.com via serper
Referenced by nodes (3)
- hallucination detection concept
- chain-of-thought concept
- reinforcement learning concept