claim
The evaluation of medical agents has evolved from linguistic metrics like BLEU and ROUGE to action-oriented benchmarks such as MedAgentBench and MedAgentBoard.
Authors
Sources
- A Comprehensive Benchmark and Evaluation Framework for Multi ... arxiv.org via serper