reference
The paper 'On mesa-optimization in autoregressively trained transformers: emergence and capability' is published in Advances in Neural Information Processing Systems and is cited in section 6.2.1 of 'A Survey on the Theory and Mechanism of Large Language Models'.

Authors

Sources

Referenced by nodes (1)