measurement
Evaluation of generation tasks uses Perplexity, Unigram Overlap (F1), BLEU-4, ROUGE-L, Knowledge F1, and Rare F1 as metrics, and utilizes datasets including WoW and CMU Document Grounded Conversations (CMU_DoG) with the KiLT Wikipedia dump as the knowledge source.
Authors
Sources
- EdinburghNLP/awesome-hallucination-detection - GitHub github.com via serper
Referenced by nodes (3)
- BLEU concept
- Perplexity concept
- F1 concept