claim
The Cleanlab RAG benchmark reports results over 6 datasets, including four from the HaluBench suite and two datasets named FinQA and ELI5.
Authors
Sources
- Real-Time Evaluation Models for RAG: Who Detects Hallucinations ... cleanlab.ai via serper
Referenced by nodes (1)
- ELI concept