Relations (1)
related 1.58 — strongly supporting 2 facts
The RL4HS framework utilizes chain-of-thought reasoning as a core component to enhance span-level hallucination detection [1], and it is empirically evaluated against chain-of-thought-based baselines to demonstrate improved performance in hallucination detection tasks [2].
Facts (2)
Sources
EdinburghNLP/awesome-hallucination-detection - GitHub github.com 2 facts
referenceRL4HS is a reinforcement-learning framework for span-level hallucination detection that couples chain-of-thought reasoning with span-level rewards, utilizing Group Relative Policy Optimization (GRPO) and Class-Aware Policy Optimization (CAPO) to address reward imbalance between hallucinated and non-hallucinated spans.
measurementOn the RAGTruth dataset, which covers QA, summarization, and data-to-text tasks, the RL4HS framework improves fine-grained hallucination detection compared to chain-of-thought-based and supervised baselines.