reference
The paper 'Deduplicating training data makes language models better' demonstrates that removing duplicate data from training sets improves language model performance.
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (1)
- Language Model concept