claim
Traditional Natural Language Processing (NLP) metrics like METEOR and BLEU fail to reflect the factual correctness of Large Vision-Language Model outputs because they only measure shallow similarities to ground truth.
Authors
Sources
- Detecting and Evaluating Medical Hallucinations in Large Vision ... arxiv.org via serper
Referenced by nodes (3)
- natural language processing concept
- Large Vision-Language Models concept
- BLEU concept