reference
Siyan Zhao, Daniel Israel, Guy Van den Broeck, and Aditya Grover define prefilling in transformer-based large language models as the computation of the key-value (KV) cache for input tokens in the prompt prior to autoregressive generation.

Authors

Sources

Referenced by nodes (2)