TAPS: Target-Aware Prefix Tree Selection for Diffusion-Drafted Speculative Decoding 文章

ArXiv CS.AI2026-06-02NEWSen作者: Zhuoyu Wang, Junnan Huang, Xinyu Chen

摘要

arXiv:2606.00487v1 Announce Type: new Abstract: Using a diffusion model for parallel drafting is a promising approach for speculative decoding. By predicting tokens at multiple future positions in a single forward pass, diffusion drafters substantially reduce drafting latency. However, this shifts the bottleneck to verification: verifying a single sequence limits acceptance length, while verifying large draft trees incurs excessive target-model latency. We identify a key mismatch in existing draft-tree methods: existing diffusion-tree methods rank nodes by the marginal probability, ignoring that verification is prefix-conditioned. As a result, they may verify unreachable descendants of rejected prefixes, increasing latency with limited acceptance gains. To address this, we propose TAPS, a target-aware prefix selection method that turns diffusion marginals into path-conditioned acceptance estimates.

TAPS: Target-Aware Prefix Tree Selection for Diffusion-Drafted Speculative Decoding 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (6)