claim
Existing benchmarks for evaluating Large Language Models are limited by static and narrow questions, which leads to limited coverage and misleading evaluations.
Authors
Sources
- A Knowledge Graph-Based Hallucination Benchmark for Evaluating ... aclanthology.org via serper
Referenced by nodes (3)
- Large Language Models concept
- hallucination concept
- benchmarks concept