Fact — procedure — Knowledge Tree

The researchers curated a dataset of instances where ROUGE and an LLM-as-Judge metric provided conflicting assessments regarding the presence of hallucinations to examine ROUGE's failure modes.

Authors

Person: Not available Organization: arXiv
Re-evaluating Hallucination Detection in LLMs - arXiv

Sources

Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org arXiv via serper

Referenced by nodes (3)

hallucination concept
LLM-as-a-judge concept
ROUGE concept