measurement
On the BertScore metric, mPLUG-Owl2 scored 64.49% and XrayGPT scored 62.62%, while the BLIP and LLaVA1.5 model families achieved scores of approximately 47%.
Authors
Sources
- Detecting and Evaluating Medical Hallucinations in Large Vision ... arxiv.org via serper
Referenced by nodes (1)
- BERTScore concept