claim
LLM-as-Judge evaluation, when validated against human judgments, reveals significant performance drops across all hallucination detection methods when they are assessed based on factual accuracy.

Authors

Sources

Referenced by nodes (3)