claim
Yingqian Cui et al. demonstrate that multi-head attention with a substantial embedding dimension performs better than single-head attention in in-context learning tasks.

Authors

Sources

Referenced by nodes (1)