Fact — claim — Knowledge Tree

Sophisticated metrics including BERTScore, BLEU, and UniEval-fact show substantial disagreement with judgments from strong LLM-based evaluators, indicating limitations in capturing factual consistency.

Authors

Person: Not available Organization: arXiv
Re-evaluating Hallucination Detection in LLMs - arXiv

Sources

Re-evaluating Hallucination Detection in LLMs - arXiv arxiv.org arXiv via serper

Referenced by nodes (3)