Fact — measurement — Knowledge Tree

On the BertScore metric, mPLUG-Owl2 scored 64.49% and XrayGPT scored 62.62%, while the BLIP and LLaVA1.5 model families achieved scores of approximately 47%.

Authors

Person: Not available Organization: arXiv
Detecting and Evaluating Medical Hallucinations in Large Vision ...

Sources

Detecting and Evaluating Medical Hallucinations in Large Vision ... arxiv.org arXiv via serper

Referenced by nodes (1)

BERTScore concept