claim
Human evaluation is considered the gold standard for hallucination detection in Large Language Models, though it is costly to implement.

Authors

Sources

Referenced by nodes (3)