measurement
Several established hallucination detection methods show performance drops of up to 45.9% when assessed using human-aligned metrics like LLM-as-Judge compared to traditional metrics.

Authors

Sources

Referenced by nodes (2)