claim
Yingqian Cui, Jie Ren, Pengfei He, Hui Liu, Jiliang Tang, and Yue Xing present a theoretical analysis comparing the exact convergence of single-head and multi-head attention in transformers for in-context learning with linear regression tasks.

Authors

Sources

Referenced by nodes (2)