procedure
The Med-HALT benchmark calculates cosine similarity between model output embeddings and two references: Answer Similarity (between the correct option and model output) and Question Similarity (between the original question and model output).
Authors
Sources
- Medical Hallucination in Foundation Models and Their ... www.medrxiv.org via serper
Referenced by nodes (1)
- cosine similarity concept