measurement
Siyan Zhao, Daniel Israel, Guy Van den Broeck, and Aditya Grover report that the prepacking method achieves significant speed and memory efficiency improvements compared to default padding-based prefilling in Huggingface across various base model configurations and inference scenarios.
Authors
Sources
- Track: Poster Session 3 - aistats 2026 virtual.aistats.org via serper
Referenced by nodes (1)
- Hugging Face entity