Fact — claim — Knowledge Tree

Optimization algorithms and attention methods in Large Language Models can attempt to induce fake behavior, and if rewards are not unique to the task, the model will have difficulty aligning with desired behaviors (Shah et al. 2022a).

Authors

Person: Not available Organization: arXiv
Building Trustworthy NeuroSymbolic AI Systems - arXiv

Sources

Building Trustworthy NeuroSymbolic AI Systems - arXiv arxiv.org arXiv via serper

Referenced by nodes (3)