Fact — claim — Knowledge Tree

In the Cleanlab RAG benchmark, a detector with a high AUROC score more consistently assigns lower scores to incorrect RAG responses than to correct ones.

Authors

Person: Not available Organization: Cleanlab
Real-Time Evaluation Models for RAG: Who Detects Hallucinations ...

Sources

Real-Time Evaluation Models for RAG: Who Detects Hallucinations ... cleanlab.ai Cleanlab via serper

Referenced by nodes (1)

AUROC concept