procedure
The authors examined the agreement between various evaluation metrics and LLM-as-Judge annotations to evaluate and compare automatic labeling strategies for hallucination detection.

Authors

Sources

Referenced by nodes (2)