measurement
F1 scores for hallucination detection methods are consistently higher on HaluBench than on RAGTruth, suggesting that RAGTruth is a more difficult benchmark.

Authors

Sources

Referenced by nodes (3)