procedure
To assess the faithfulness of models to original documents in summarisation tasks, the Hallucination Leaderboard uses ROUGE (measuring overlap between generated and reference text), factKB (a generalisable model-based metric for factuality evaluation), and BERTScore-Precision (which computes similarity between two texts using token representation similarities).
Authors
Sources
- The Hallucinations Leaderboard, an Open Effort to Measure ... huggingface.co via serper
Referenced by nodes (2)
- ROUGE concept
- Hallucination Leaderboard concept