claim
He et al. (2025a) provide empirical and theoretical evidence that fine-tuning attention layers is more critical for downstream tasks than tuning MLP layers.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (1)
- fine-tuning concept