reference
HSA-DPO (Severity-Aware Direct Preference Optimization) is a method that uses fine-grained AI feedback to label hallucination severity and prioritize critical errors during the training of large vision-language models.
Authors
Sources
- EdinburghNLP/awesome-hallucination-detection - GitHub github.com via serper
Referenced by nodes (1)
- Large Vision-Language Models concept