Fact — reference — Knowledge Tree

The study evaluated several alternative metrics for text evaluation, including BERTScore (Zhang et al., 2020), BLEU (Papineni et al., 2002), SummaC (Laban et al., 2022), and UniEval-fact (Zhong et al., 2022), benchmarking them against LLM-as-Judge labels.

Authors

Person: Not available Organization: arXiv
Re-evaluating Hallucination Detection in LLMs - arXiv

Sources

Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org arXiv via serper

Referenced by nodes (2)

SummaC concept
LLM-as-a-judge concept