claim
A study found that the Trustworthy Language Model (TLM) detects incorrect responses more effectively than LLM-as-a-judge or token probability (logprobs) techniques across all major LLM models.
Authors
Sources
- Benchmarking Hallucination Detection Methods in RAG - Cleanlab cleanlab.ai via serper
Referenced by nodes (2)
- LLM-as-a-judge concept
- Trustworthy Language Model concept