claim
Siyan Zhao, Daniel Israel, Guy Van den Broeck, and Aditya Grover identify that standard padding-based prefilling in large language models wastes significant computation when batches contain prompts of varying lengths.
Authors
Sources
- Track: Poster Session 3 - aistats 2026 virtual.aistats.org via serper
Referenced by nodes (1)
- Large Language Models concept