claim
A potential limitation of the LLM-as-a-judge approach is that because hallucinations stem from the unreliability of Large Language Models, relying on the same model to evaluate itself may not sufficiently close the reliability gap.
Authors
Sources
- Real-Time Evaluation Models for RAG: Who Detects Hallucinations ... cleanlab.ai via serper
Referenced by nodes (2)
- Large Language Models concept
- LLM-as-a-judge concept