claim
Alternative metrics such as BERTScore, BLEU, and UniEval-fact exhibit substantial shortcomings in reliably detecting hallucinations in question-answering tasks, particularly under zero-shot conditions.

Authors

Sources

Referenced by nodes (2)