Relations (1)

related 0.10 — supporting 1 fact

Hallucination and F1 score are related in the evaluation of AI model performance on hallucination detection, as [1] indicates that the best model on the MedHallu benchmark achieved an F1 score of 0.625 for detecting 'hard' category hallucinations.

Facts (1)

Sources
[2502.14302] MedHallu: A Comprehensive Benchmark for Detecting ... arxiv.org arXiv 1 fact
measurementThe best performing model on the MedHallu benchmark achieved an F1 score as low as 0.625 for detecting 'hard' category hallucinations.