measurement
Several established hallucination detection methods for Large Language Models exhibit performance drops of up to 45.9% when evaluated using human-aligned metrics such as LLM-as-a-Judge.
Referenced by nodes (2)
- Large Language Models concept
- LLM-as-a-judge concept