procedure
In the 'LLM-as-judge' evaluation framework, conversations between doctor agents and simulated patient agents are evaluated against rubric criteria using a model-based grader that outputs binary judgment verdicts of 'Satisfied' or 'Not Satisfied' for each rubric.

Authors

Sources

Referenced by nodes (1)