claim
Ren and Liu (2025) reveal that Transformers have an inherent bias toward learning distributions with lower entropy than the true target, a bias primarily driven by the feed-forward (FFN) modules.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (2)
- Transformers concept
- entropy concept