Knowledge Tree

Several established hallucination detection methods for Large Language Models exhibit performance drops of up to 45.9% when evaluated using human-aligned metrics such as LLM-as-a-Judge.

Referenced by nodes (2)

Large Language Models concept
LLM-as-a-judge concept