Relations (1)
related 1.00 — strongly supporting 1 fact
The concepts are related because researchers analyzed the failure modes of the ROUGE metric specifically in the context of detecting hallucination in LLM outputs, as described in [1].
Facts (1)
Sources
Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org 1 fact
procedureThe researchers curated a dataset of instances where ROUGE and an LLM-as-Judge metric provided conflicting assessments regarding the presence of hallucinations to examine ROUGE's failure modes.