Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding 事件
PRODUCT_LAUNCH2026-05-29影响: MEDIUM
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding arXiv:2605.29707v1 Announce Type: new Abstract: Speculative decoding accelerates LLM inference by drafting multiple tokens and verifying them in parallel with the target model. However, its practical speedup is constrained by the trade-off between draft quality and drafting cost: autoregressive drafters model causal dependencies among draft tokens but incur sequential overhead, while parallel drafters reduce