claim
A study benchmarking evaluation models including Patronus Lynx, Prometheus 2, and HHEM found that the Trustworthy Language Model (TLM) detects incorrect RAG responses with universally higher precision and recall than those models.

Authors

Sources

Referenced by nodes (2)