Fact — claim — Knowledge Tree

In the PubmedQA benchmark, the Prometheus and TLM evaluation models detect incorrect AI responses with the highest precision and recall, effectively catching hallucinations.

Authors

Person: Not available Organization: Cleanlab
Real-Time Evaluation Models for RAG: Who Detects Hallucinations ...

Sources

Real-Time Evaluation Models for RAG: Who Detects Hallucinations ... cleanlab.ai Cleanlab via serper

Referenced by nodes (2)

PubmedQA concept
TLM concept