claim
The RAGAS hallucination detection metric often fails to produce internal LLM statements necessary for its computations when applied to the FinanceBench dataset, as RAGAS is more effective when answers are complete sentences rather than single numbers.
Authors
Sources
- Benchmarking Hallucination Detection Methods in RAG - Cleanlab cleanlab.ai via serper
Referenced by nodes (2)
- RAGAS concept
- FinanceBench concept