measurement
The Med-HallMark benchmark evaluates AI models on hallucination detection using the MediHall Score and traditional metrics including BertScore, METEOR, ROUGE-1, ROUGE-2, ROUGE-L, and BLEU.
Authors
Sources
- Detecting and Evaluating Medical Hallucinations in Large Vision ... arxiv.org via serper
Referenced by nodes (3)
- BERTScore concept
- BLEU concept
- MediHall Score concept