procedure
The evaluation pipeline for doctor agents proceeds in the following steps: (1) generating a multi-turn consultation by interacting with the model and a controlled patient agent; (2) extracting inquiry actions and reasoning patterns from the dialogue context; (3) applying structured scoring using the MedDialogRubrics LLM-as-a-Judge pipeline; (4) performing consistency verification with safety penalties for discrepancies; (5) aggregating scores at both the per-case and per-dataset granularities.
Authors
Sources
- A Comprehensive Benchmark and Evaluation Framework for Multi ... arxiv.org via serper
Referenced by nodes (1)
- MedDialogRubrics concept