Relations (1)

related 1.00 — strongly supporting 1 fact

The concepts are related because researchers analyzed the failure modes of the ROUGE metric specifically in the context of detecting hallucination in LLM outputs, as described in [1].

Facts (1)

Sources
Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org arXiv 1 fact
procedureThe researchers curated a dataset of instances where ROUGE and an LLM-as-Judge metric provided conflicting assessments regarding the presence of hallucinations to examine ROUGE's failure modes.