claim
Early studies by Shin et al. (2020) and Deng et al. (2022) demonstrate that short discrete triggers can reliably elicit target behaviors in language models, although these prompts are often difficult for humans to interpret.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (1)
- Language Model concept