Fact — claim — Knowledge Tree

Tian et al. (2023) revealed that the self-attention mechanism exhibits a 'scan and snap' dynamic, where the model initially distributes attention uniformly across all tokens and gradually focuses on distinctive tokens discriminative for predicting specific next tokens while reducing attention on common tokens.

Authors

Person: Not available Organization: arXiv
A Survey on the Theory and Mechanism of Large Language Models

Sources

A Survey on the Theory and Mechanism of Large Language Models arxiv.org arXiv via serper

Referenced by nodes (1)

self-attention mechanism concept