claim
The Cleanlab RAG benchmark datasets are composed of entries containing a user query, retrieved context, an LLM-generated response, and a binary annotation indicating whether the response was correct.
Authors
Sources
- Real-Time Evaluation Models for RAG: Who Detects Hallucinations ... cleanlab.ai via serper
Referenced by nodes (1)
- Cleanlab entity