measurement
The Liberal Strategy for ensemble LLM judges achieves the highest alignment metrics with human clinical experts, particularly for the GPT-5 model.
Authors
Sources
- A Comprehensive Benchmark and Evaluation Framework for Multi ... arxiv.org via serper
Referenced by nodes (2)
- LLM-as-a-judge concept
- GPT-5 concept