claim
LLM-based evaluation, particularly using GPT-4, yields the best overall results for detecting hallucinations in language models.
Authors
Sources
- Evaluating Evaluation Metrics -- The Mirage of Hallucination Detection arxiv.org via serper
Referenced by nodes (1)
- Language Model concept