reference
The paper 'Planbench: an extensible benchmark for evaluating large language models on planning and reasoning about change' by Valmeekam et al. (2024) presents a benchmark designed to evaluate the planning and reasoning capabilities of large language models.
Authors
Sources
- Combining large language models with enterprise knowledge graphs www.frontiersin.org via serper
Referenced by nodes (3)
- Large Language Models concept
- reasoning concept
- planning concept