measurement
Wu et al. (2025a) released the RAIDEN Benchmark, which consists of 40,000 multi-turn dialogues for LLM agents.
Authors
Sources
- A Survey of Incorporating Psychological Theories in LLMs - arXiv arxiv.org via serper
Referenced by nodes (2)
- LLM-based agent concept
- multi-turn conversations concept