claim
The Cleanlab researchers excluded the HaluEval and RAGTruth datasets from their benchmark suite because they discovered significant errors in the ground truth annotations of those datasets.
Authors
Sources
- Benchmarking Hallucination Detection Methods in RAG - Cleanlab cleanlab.ai via serper