reference
A study found that TLM (Trustworthy Language Model) detects incorrect RAG responses more effectively than techniques like 'LLM-as-a-judge' or token probabilities (logprobs) across all major Large Language Models.
Authors
Sources
- Real-Time Evaluation Models for RAG: Who Detects Hallucinations ... cleanlab.ai via serper
Referenced by nodes (4)
- Large Language Models concept
- LLM-as-a-judge concept
- Trustworthy Language Model concept
- TLM concept