reference
Garg et al. (2022) demonstrated that Transformers could effectively learn and generalize on complex function classes, including two-layer neural networks and four-layer decision trees.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (1)
- Transformers concept