reference
The paper 'Planbench: an extensible benchmark for evaluating large language models on planning and reasoning about change' by Valmeekam et al. (2024) presents a benchmark designed to evaluate the planning and reasoning capabilities of large language models.

Authors

Sources

Referenced by nodes (3)