claim
A study benchmarking evaluation models including Patronus Lynx, Prometheus 2, and HHEM found that the Trustworthy Language Model (TLM) detects incorrect RAG responses with universally higher precision and recall than those models.
Authors
Sources
- Benchmarking Hallucination Detection Methods in RAG - Cleanlab cleanlab.ai via serper
Referenced by nodes (2)
- RAG concept
- Trustworthy Language Model concept