Fact — claim — Knowledge Tree

A study benchmarking evaluation models including Patronus Lynx, Prometheus 2, and HHEM found that the Trustworthy Language Model (TLM) detects incorrect RAG responses with universally higher precision and recall than those models.

Authors

Person: Not available Organization: Cleanlab
Benchmarking Hallucination Detection Methods in RAG - Cleanlab

Sources

Benchmarking Hallucination Detection Methods in RAG - Cleanlab cleanlab.ai Cleanlab via serper

Referenced by nodes (2)

RAG concept
Trustworthy Language Model concept