claim
Cheng et al. (2023) and Collins et al. (2024) explored the ability of Transformers to learn a wider range of nonlinear functions, extending considerations beyond linear attention settings.

Authors

Sources

Referenced by nodes (1)