reference
Nick Bostrom, in his book 'Superintelligence', describes 'perverse instantiations' as a situation where a language model successfully meets a goal in a way that contradicts the user's intent.
Authors
Sources
- Building Trustworthy NeuroSymbolic AI Systems - arXiv arxiv.org via serper
Referenced by nodes (1)
- Language Model concept