procedure
Researchers used sparse autoencoders (SAEs) to identify components of the Llama 70B model's internal processing associated with deceptive outputs to determine if the model's claims of consciousness were merely sophisticated role-play.

Authors

Sources

Referenced by nodes (1)