formula
Li et al. (2024b) demonstrated that constant-depth Transformers without Chain-of-Thought (CoT) are restricted to parallelizable complexity classes such as AC0 or NC1, while the addition of reasoning steps enables the model to solve any problem within the complexity class P.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (1)
- chain-of-thought concept