claim
The LLaVA-Med series, BLIP2, and RadFM models cannot produce a computable MediHall Score on the IRG task because their generation formats are not suitable for reporting generation scenarios with contextual reasoning properties.
Authors
Sources
- Detecting and Evaluating Medical Hallucinations in Large Vision ... arxiv.org via serper
Referenced by nodes (1)
- MediHall Score concept