reference
Dehghani et al. (2018) introduced the Universal Transformer, which improves generalization by sharing parameters across layers and allowing the model to flexibly adjust its iterative depth.

Authors

Sources

Referenced by nodes (1)