claim
Existing prefix-tree-based constrained decoding for large language models is inefficient under GPU-based model inference paradigms and introduces unintended biases into the output distribution.

Authors

Sources

Referenced by nodes (1)