reference
The study evaluated several alternative metrics for text evaluation, including BERTScore (Zhang et al., 2020), BLEU (Papineni et al., 2002), SummaC (Laban et al., 2022), and UniEval-fact (Zhong et al., 2022), benchmarking them against LLM-as-Judge labels.

Authors

Sources

Referenced by nodes (2)