reference
Nick Bostrom, in his book 'Superintelligence', describes 'perverse instantiations' as a situation where a language model successfully meets a goal in a way that contradicts the user's intent.

Authors

Sources

Referenced by nodes (1)