measurement
Several established hallucination detection methods show performance drops of up to 45.9% when assessed using human-aligned metrics like LLM-as-Judge compared to traditional metrics.
Authors
Sources
- The Illusion of Progress: Re-evaluating Hallucination Detection in ... arxiv.org via serper
- Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org via serper
Referenced by nodes (2)
- hallucination detection concept
- LLM-as-a-judge concept