Knowledge Tree

The evaluation pipeline for doctor agents proceeds in the following steps: (1) generating a multi-turn consultation by interacting with the model and a controlled patient agent; (2) extracting inquiry actions and reasoning patterns from the dialogue context; (3) applying structured scoring using the MedDialogRubrics LLM-as-a-Judge pipeline; (4) performing consistency verification with safety penalties for discrepancies; (5) aggregating scores at both the per-case and per-dataset granularities.

Authors

Person: Not available Organization: arXiv
A Comprehensive Benchmark and Evaluation Framework for Multi ...

Sources

A Comprehensive Benchmark and Evaluation Framework for Multi ... arxiv.org arXiv via serper

Referenced by nodes (1)

MedDialogRubrics concept