claim
Most hallucination detection methods, excluding the basic Self-Evaluation technique, struggled to provide significant improvements over random guessing when evaluated on the FinanceBench dataset.
Authors
Sources
- Benchmarking Hallucination Detection Methods in RAG - Cleanlab cleanlab.ai via serper
Referenced by nodes (2)
- LLM-as-a-judge concept
- FinanceBench concept