BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding 事件
PRODUCT_LAUNCH2026-06-02影响: MEDIUM
BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding arXiv:2606.00144v1 Announce Type: cross Abstract: Speculative decoding speeds up autoregressive decoding by using a drafter to propose multiple tokens that a verifier validates in parallel. In resource-constrained deployments, the drafter uses a sparse KV cache to limit peak GPU memory and end-to-end latency under a fixed KV budget, while the verifier keeps a full KV cache. Mid-to-long context inference (4K--16
相关报道查看全部 (1)
BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding
ArXiv CS.AI2026-06-02