claim
The Unanimous Voting strategy for ensemble LLM judges results in lower Recall and F1-scores compared to other strategies, indicating that it is overly penalizing.
Authors
Sources
- A Comprehensive Benchmark and Evaluation Framework for Multi ... arxiv.org via serper
Referenced by nodes (3)
- LLM-as-a-judge concept
- recall concept
- F1 score concept