formula
For Large Language Models based on the transformer architecture, the hidden state at step t is calculated using the current token and all previous hidden states: h_t = f(x_t, h_{t-1}, ..., h_1).
Authors
Sources
- Combining Knowledge Graphs and Large Language Models - arXiv arxiv.org via serper
Referenced by nodes (2)
- Large Language Models concept
- Transformer architecture concept