measurement
The BLIP family of models achieves an average score of 7.35% on the ROUGE metric when evaluated on the Med-VQA task within the Med-HallMark benchmark.
Authors
Sources
- Detecting and Evaluating Medical Hallucinations in Large Vision ... arxiv.org via serper
Referenced by nodes (1)
- ROUGE concept