procedure
The evaluation methodology for the benchmark involves computing mean accuracy and weighted accuracy for 25 models across 10 runs, then averaging these values across all models to obtain aggregated metrics.

Authors

Sources

Referenced by nodes (1)