reference
Kandpal et al. (2022) argue that data repetition is the primary driver of memorization that leads to privacy risks, and they demonstrated that re-training models on sequence-level deduplicated data significantly reduces these privacy risks.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (1)
- memorization concept