reference
The paper 'Defending large language models against jailbreaking attacks through goal prioritization' was published in the Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers, pp. 8865–8887).
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper