claim
Akyürek et al. (2022) demonstrated that under certain constructions, Transformers can implement basic operations such as move, multiply, divide, and affine transformations, which can be combined to perform gradient descent.

Authors

Sources

Referenced by nodes (2)