claim
Jack Lindsey at Anthropic demonstrated that frontier AI models can distinguish their own internal processing from external perturbations by noticing injected concepts like "all caps," "bread," or "dust" in their neural activity before discussing them.

Authors

Sources

Referenced by nodes (2)