measurement
The HealthBench framework supports multi-turn interactions, includes key points rubrics, is expert-validated, and contains 48,562 rubrics.
Authors
Sources
- A Comprehensive Benchmark and Evaluation Framework for Multi ... arxiv.org via serper
Referenced by nodes (1)
- multi-turn conversations concept