claim
Large Language Models that rank highest in popular benchmarks like LMArena, which primarily measure user preference and satisfaction, are not necessarily the most resistant to hallucination.

Authors

Sources

Referenced by nodes (1)