Fact — procedure — Knowledge Tree

In the 'LLM-as-judge' evaluation framework, conversations between doctor agents and simulated patient agents are evaluated against rubric criteria using a model-based grader that outputs binary judgment verdicts of 'Satisfied' or 'Not Satisfied' for each rubric.

Authors

Person: Not available Organization: arXiv
A Comprehensive Benchmark and Evaluation Framework for Multi ...

Sources

A Comprehensive Benchmark and Evaluation Framework for Multi ... arxiv.org arXiv via serper

Referenced by nodes (1)

LLM-as-a-judge concept