measurement
The GLM-4.5 model achieves a performance score of 54.35%, outperforming proprietary models such as Claude-4-Opus and Gemini-2.5-Flash.
Authors
Sources
- A Knowledge Graph-Based Hallucination Benchmark for Evaluating ... arxiv.org via serper
Referenced by nodes (1)
- Gemini-1.5-Flash concept