claim
Yingqian Cui et al. demonstrate that multi-head attention with a substantial embedding dimension performs better than single-head attention in in-context learning tasks.
Authors
Sources
- Track: Poster Session 3 - aistats 2026 virtual.aistats.org via serper
Referenced by nodes (1)
- In-Context Learning concept