reference
Qian et al. (2024) found that concepts related to trustworthiness become linearly separable early in the pre-training phase of Large Language Models, as revealed by applying linear probing techniques to intermediate checkpoints.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (3)
- Large Language Models concept
- Pre-training concept
- trustworthiness concept