measurement
Sanford et al. (2023) introduced the 'sparse averaging' task and demonstrated that Transformers achieve only logarithmic communication complexity, whereas RNNs and feed-forward networks require polynomial communication complexity.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (1)
- Transformers concept