claim
The Pointwise and Similarity Scores in the Med-HALT benchmark do not directly capture clinical safety or potential for patient harm, as an output could be semantically similar but clinically inappropriate or omit critical warnings.
Authors
Sources
- Medical Hallucination in Foundation Models and Their Impact on ... www.medrxiv.org via serper
Referenced by nodes (2)
- Med-HALT concept
- clinical safety evaluation framework concept