reference
HSA-DPO (Severity-Aware Direct Preference Optimization) is a method that uses fine-grained AI feedback to label hallucination severity and prioritize critical errors during the training of large vision-language models.

Authors

Sources

Referenced by nodes (1)