claim
The authors of the paper 'Re-evaluating Hallucination Detection in LLMs' demonstrate that prevailing overlap-based metrics systematically overestimate hallucination detection performance in Question Answering tasks, which leads to illusory progress in the field.
Authors
Sources
- Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org via serper