measurement
The MedDialogRubrics temporal analysis reveals a behavioral gap of up to 20% in rubric coverage, indicating that static snapshots of model performance obscure the clinical reasoning process.
Authors
Sources
- A Comprehensive Benchmark and Evaluation Framework for Multi ... arxiv.org via serper
Referenced by nodes (1)
- MedDialogRubrics concept