measurement
Siyan Zhao, Daniel Israel, Guy Van den Broeck, and Aditya Grover report that the prepacking method achieves significant speed and memory efficiency improvements compared to default padding-based prefilling in Huggingface across various base model configurations and inference scenarios.

Authors

Sources

Referenced by nodes (1)