claim
Existing prefix-tree-based constrained decoding for large language models is inefficient under GPU-based model inference paradigms and introduces unintended biases into the output distribution.
Authors
Sources
- Track: Poster Session 3 - aistats 2026 virtual.aistats.org via serper
Referenced by nodes (1)
- Large Language Models concept