reference
Evaluation benchmarks for vision-language hallucination detection and mitigation include MHaluBench, MFHaluBench, Object HalBench, AMBER, MMHal-Bench, and POPE, which utilize metrics such as accuracy, precision, recall, F1-score, CHAIR, Cover, Hal, and Cog.
Authors
Sources
- EdinburghNLP/awesome-hallucination-detection - GitHub github.com via serper