claim
Benchmarks that only measure whether answers are correct or incorrect fail to reveal miscalibration in uncertainty expression in large language models.
Authors
Sources
- Hallucination Causes: Why Language Models Fabricate Facts mbrenndoerfer.com via serper
Referenced by nodes (2)
- Large Language Models concept
- benchmarks concept