claim
Nichani et al. (2025) demonstrated that a single-layer Transformer with self-attention and MLP can achieve perfect prediction accuracy when the number of self-attention parameters or MLP parameters scales almost linearly with the number of facts.

Authors

Sources

Referenced by nodes (2)