reference
Marks and Tegmark (2023) identified a generalized 'truth direction' within the geometry of Large Language Models, showing that a simple linear probe can consistently distinguish truthful statements across diverse topics and datasets.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (1)
- Large Language Models concept