claim
The design and selection of deep learning model architectures are influenced by both the latent characteristics of the training data and the training paradigm adopted, such as next-token prediction (NTP) or masked language modeling (MLM).
Authors
Sources
- A Survey on the Theory and Mechanism of Large Language Models arxiv.org via serper
Referenced by nodes (2)
- deep learning concept
- training data concept