claim
Different attention patterns can be learned to generate bounded outputs, and interpretability via local ("myopic") analysis can be provably misleading on Transformers, according to Wen et al. (2023).

Authors

Sources

Referenced by nodes (2)