reference
Garg et al. (2022) demonstrated that Transformers could effectively learn and generalize on complex function classes, including two-layer neural networks and four-layer decision trees.

Authors

Sources

Referenced by nodes (1)