claim
Meyer et al. (2025) formally prove that the amount of information a Transformer can memorize via prompt tuning is linearly bounded by the prompt length, establishing a capacity bottleneck.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (1)
- Transformer concept