reference
The paper 'Deduplicating training data makes language models better' demonstrates that removing duplicate data from training sets improves language model performance.

Authors

Sources

Referenced by nodes (1)