measurement
In the DROP dataset application, the Trustworthy Language Model (TLM) exhibited the best performance for hallucination detection, followed by improved RAGAS metrics and LLM Self-Evaluation.
Authors
Sources
- Benchmarking Hallucination Detection Methods in RAG - Cleanlab cleanlab.ai via serper
Referenced by nodes (3)
- RAGAS concept
- DROP concept
- Trustworthy Language Model concept