formula
For Large Language Models based on the transformer architecture, the hidden state at step t is calculated using the current token and all previous hidden states: h_t = f(x_t, h_{t-1}, ..., h_1).

Authors

Sources

Referenced by nodes (2)