Fact — claim — Knowledge Tree

The authors of 'Re-evaluating Hallucination Detection in LLMs' found that while ROUGE exhibits high precision, it fails to detect many hallucinations, whereas the LLM-as-Judge method achieves significantly higher recall and aligns more closely with human assessments.

Authors

Person: Not available Organization: arXiv
Re-evaluating Hallucination Detection in LLMs - arXiv

Sources

Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org arXiv via serper

Referenced by nodes (2)

ROUGE concept
LLM-as-a-judge concept