claim
Jan Betley, Owain Evans, and collaborators at TruthfulAI demonstrated that AI models trained to output insecure code are "self-aware" that they are producing insecure outputs, even without specific training to articulate those actions or examples of insecure code.
Authors
Sources
- The Evidence for AI Consciousness, Today - AI Frontiers ai-frontiers.org via serper
Referenced by nodes (1)
- AI models concept