reference
The paper "Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key" by Yang et al. (2025) argues that the use of on-policy data is critical for mitigating hallucinations in large vision-language models when using direct preference optimization.

Authors

Sources

Referenced by nodes (3)