chain-of-thought ↔ hallucination detection

Relations (1)

related 1.58 — strongly supporting 2 facts

The RL4HS framework utilizes chain-of-thought reasoning as a core component to enhance span-level hallucination detection [1], and it is empirically evaluated against chain-of-thought-based baselines to demonstrate improved performance in hallucination detection tasks [2].

Facts (2)

Sources

EdinburghNLP/awesome-hallucination-detection - GitHub github.com GitHub 2 facts

referenceRL4HS is a reinforcement-learning framework for span-level hallucination detection that couples chain-of-thought reasoning with span-level rewards, utilizing Group Relative Policy Optimization (GRPO) and Class-Aware Policy Optimization (CAPO) to address reward imbalance between hallucinated and non-hallucinated spans.

measurementOn the RAGTruth dataset, which covers QA, summarization, and data-to-text tasks, the RL4HS framework improves fine-grained hallucination detection compared to chain-of-thought-based and supervised baselines.