claim
The Cleanlab RAG benchmark uses OpenAI’s gpt-4o-mini LLM to power both the 'LLM-as-a-judge' and 'TLM' scoring methods.
Authors
Sources
- Real-Time Evaluation Models for RAG: Who Detects Hallucinations ... cleanlab.ai via serper
Referenced by nodes (4)
- LLM-as-a-judge concept
- OpenAI entity
- gpt-4o-mini concept
- TLM concept