claim
The Cleanlab RAG benchmark evaluates how effectively detection methods flag incorrect responses, rather than focusing on finer-grained concerns like retrieval quality, faithfulness, or context utilization.
Authors
Sources
- Real-Time Evaluation Models for RAG: Who Detects Hallucinations ... cleanlab.ai via serper
Referenced by nodes (1)
- Cleanlab entity