reference
Park et al. (2023) formalized the Linear Representation Hypothesis (LRH) in both input and output spaces using counterfactual interventions and introduced a 'causal inner product' that unifies the geometric treatment of linear probing and model steering, providing these directions with a causal interpretation.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (1)
- Large Language Models concept