claim
Researchers (2024) identified a functional bifurcation within Transformer layers, where lower layers transform representations from pre-training priors to context-aware embeddings, while middle-to-higher layers act as answer writers that causally integrate information from previously generated Chain-of-Thought (CoT) steps.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (1)
- chain-of-thought concept