Fact — claim — Knowledge Tree

Traditional Natural Language Processing (NLP) metrics like METEOR and BLEU fail to reflect the factual correctness of Large Vision-Language Model outputs because they only measure shallow similarities to ground truth.

Authors

Person: Not available Organization: arXiv
Detecting and Evaluating Medical Hallucinations in Large Vision ...

Sources

Detecting and Evaluating Medical Hallucinations in Large Vision ... arxiv.org arXiv via serper

Referenced by nodes (3)