procedure
To assess the faithfulness of models to original documents in summarisation tasks, the Hallucination Leaderboard uses ROUGE (measuring overlap between generated and reference text), factKB (a generalisable model-based metric for factuality evaluation), and BERTScore-Precision (which computes similarity between two texts using token representation similarities).

Authors

Sources

Referenced by nodes (2)