reference
MedDialogRubrics is a benchmark and evaluation framework designed to assess the multi-turn inquiry abilities of medical Large Language Models (LLMs) by focusing on fine-grained, human-aligned evaluation of the diagnostic process rather than just single-turn QA or final diagnosis accuracy.
Authors
Sources
- A Comprehensive Benchmark and Evaluation Framework for Multi ... arxiv.org via serper
Referenced by nodes (2)
- Question Answering concept
- MedDialogRubrics concept