perspective
Existing benchmarks for Large Language Models are limited by static and narrow questions, which leads to limited coverage and misleading evaluations of model truthfulness.
Authors
Sources
- A Knowledge Graph-Based Hallucination Benchmark for Evaluating ... aclanthology.org via serper
Referenced by nodes (2)
- Large Language Models concept
- benchmarks concept