claim
LLM-Judges suffer from design flaws such as low 'schematic adherence' and 'factor collapse', which cause misalignment between evaluation results and criteria (Feuer et al., 2025).

Authors

Sources

Referenced by nodes (1)