reference
The paper 'Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching' by Campbell et al. (2023) investigates instructed dishonesty in Llama models.

Authors

Sources

Referenced by nodes (1)