Fact — claim — Knowledge Tree

In the ELI5 benchmark, the Prometheus and TLM evaluation models are more effective at detecting incorrect AI responses than other detectors, though no method achieves very high precision or recall.

Authors

Person: Not available Organization: Cleanlab
Real-Time Evaluation Models for RAG: Who Detects Hallucinations ...

Sources

Real-Time Evaluation Models for RAG: Who Detects Hallucinations ... cleanlab.ai Cleanlab via serper

Referenced by nodes (2)

ELI concept
TLM concept