reference
The paper 'Training nonlinear transformers for chain-of-thought inference: a theoretical generalization analysis' provides a theoretical analysis of how nonlinear transformers generalize when trained for chain-of-thought inference.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (2)
- chain-of-thought concept
- generalization concept