claim
Siyan Zhao, Daniel Israel, Guy Van den Broeck, and Aditya Grover identify that standard padding-based prefilling in large language models wastes significant computation when batches contain prompts of varying lengths.

Authors

Sources

Referenced by nodes (1)