perspective
Existing benchmarks for Large Language Models are limited by static and narrow questions, which leads to limited coverage and misleading evaluations of model truthfulness.

Authors

Sources

Referenced by nodes (2)