reference
The paper 'Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching' by Campbell et al. (2023) investigates instructed dishonesty in Llama models.
Authors
Sources
- Awesome-Hallucination-Detection-and-Mitigation - GitHub github.com via serper
Referenced by nodes (1)
- LLaMA concept