reference
Marks and Tegmark (2023) identified a generalized 'truth direction' within the geometry of Large Language Models, showing that a simple linear probe can consistently distinguish truthful statements across diverse topics and datasets.

Authors

Sources

Referenced by nodes (1)