Fact — measurement — Knowledge Tree

The RAGAS Faithfulness evaluation framework experienced a 58.90% failure rate on the DROP dataset, 0.70% on RAGTruth, 83.50% on FinanceBench, 0.10% on PubMedQA, and 21.20% on CovidQA, where a failure is defined as the software returning an error instead of a score.

Authors

Person: Not available Organization: Cleanlab
Benchmarking Hallucination Detection Methods in RAG - Cleanlab

Sources

Benchmarking Hallucination Detection Methods in RAG - Cleanlab cleanlab.ai Cleanlab via serper

Referenced by nodes (5)

CovidQA concept
DROP concept
RAGTruth concept
PubmedQA concept
FinanceBench concept