Relations (1)
related 0.10 — supporting 1 fact
Hallucination and F1 score are related in the evaluation of AI model performance on hallucination detection, as [1] indicates that the best model on the MedHallu benchmark achieved an F1 score of 0.625 for detecting 'hard' category hallucinations.
Facts (1)
Sources
[2502.14302] MedHallu: A Comprehensive Benchmark for Detecting ... arxiv.org 1 fact
measurementThe best performing model on the MedHallu benchmark achieved an F1 score as low as 0.625 for detecting 'hard' category hallucinations.