reference
Kandpal et al. (2022) argue that data repetition is the primary driver of memorization that leads to privacy risks, and they demonstrated that re-training models on sequence-level deduplicated data significantly reduces these privacy risks.

Authors

Sources

Referenced by nodes (1)