measurement
Evaluation of generation tasks uses Perplexity, Unigram Overlap (F1), BLEU-4, ROUGE-L, Knowledge F1, and Rare F1 as metrics, and utilizes datasets including WoW and CMU Document Grounded Conversations (CMU_DoG) with the KiLT Wikipedia dump as the knowledge source.

Authors

Sources

Referenced by nodes (3)