claim
Current evaluation metrics like BLEU (Papineni et al., 2002) and ROUGE (Lin, 2004) mainly measure surface text similarity and fail to effectively capture the semantic consistency between generated text and knowledge graph content.

Authors

Sources

Referenced by nodes (3)