claim
He et al. (2025a) provide empirical and theoretical evidence that fine-tuning attention layers is more critical for downstream tasks than tuning MLP layers.

Authors

Sources

Referenced by nodes (1)