Fact — claim — Knowledge Tree

Ren and Liu (2025) reveal that Transformers have an inherent bias toward learning distributions with lower entropy than the true target, a bias primarily driven by the feed-forward (FFN) modules.

Authors

Person: Not available Organization: arXiv
A Survey on the Theory and Mechanism of Large Language Models

Sources

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv via serper

Referenced by nodes (2)

Transformers concept
entropy concept