Fact — claim — Knowledge Tree

A study found that the Trustworthy Language Model (TLM) detects incorrect responses more effectively than LLM-as-a-judge or token probability (logprobs) techniques across all major LLM models.

Authors

Person: Not available Organization: Cleanlab
Benchmarking Hallucination Detection Methods in RAG - Cleanlab

Sources

Benchmarking Hallucination Detection Methods in RAG - Cleanlab cleanlab.ai Cleanlab via serper

Referenced by nodes (2)

LLM-as-a-judge concept
Trustworthy Language Model concept