formula
The Cleanlab benchmark evaluates hallucination detectors based on AUROC, defined as the probability that the detector's score will be lower for an example where the LLM responded incorrectly than for an example where the LLM responded correctly.

Authors

Sources

Referenced by nodes (2)