claim
The LLaVA-Med series, BLIP2, and RadFM models cannot produce a computable MediHall Score on the IRG task because their generation formats are not suitable for reporting generation scenarios with contextual reasoning properties.

Authors

Sources

Referenced by nodes (1)