procedure
The evaluation methodology for the benchmark involves computing mean accuracy and weighted accuracy for 25 models across 10 runs, then averaging these values across all models to obtain aggregated metrics.
Authors
Sources
- A Knowledge Graph-Based Hallucination Benchmark for Evaluating ... arxiv.org via serper
Referenced by nodes (1)
- KGHaluBench concept