claim
In the Cleanlab RAG benchmark, a detector with a high AUROC score more consistently assigns lower scores to incorrect RAG responses than to correct ones.
Authors
Sources
- Real-Time Evaluation Models for RAG: Who Detects Hallucinations ... cleanlab.ai via serper
Referenced by nodes (1)
- AUROC concept