claim
Von Oswald et al. (2023b) proposed a constructive approach under the auto-regressive setting that reaches conclusions similar to those regarding online gradient descent in Transformer models.

Authors

Sources

Referenced by nodes (1)