procedure
In the 'LLM-as-judge' evaluation framework, conversations between doctor agents and simulated patient agents are evaluated against rubric criteria using a model-based grader that outputs binary judgment verdicts of 'Satisfied' or 'Not Satisfied' for each rubric.
Authors
Sources
- A Comprehensive Benchmark and Evaluation Framework for Multi ... arxiv.org via serper
Referenced by nodes (1)
- LLM-as-a-judge concept