Reinforcement learning from human feedback (RLHF) ↔ hallucination

Relations (1)

related 1.58 — strongly supporting 2 facts

Reinforcement learning from human feedback (RLHF) is identified as a primary mitigation strategy used to address and reduce model-level hallucinations in large language models, as supported by [1] and [2].

Facts (2)

Sources

Survey and analysis of hallucinations in large language models frontiersin.org Frontiers 2 facts

procedureMitigation strategies for large language model hallucinations at the modeling level include Reinforcement Learning from Human Feedback (RLHF) (Ouyang et al., 2022), retrieval fusion (Lewis et al., 2020), and instruction tuning (Wang et al., 2022).

procedureTechniques such as Reinforcement Learning with Human Feedback (RLHF) (Ouyang et al., 2022) and Retrieval-Augmented Generation (RAG) (Lewis et al., 2020) are used to address model-level limitations regarding hallucinations.