AdaPLD: Adaptive Retrieval and Reuse for Efficient Model-Free Speculative Decoding 文章

ArXiv CS.CL2026-06-05NEWSen作者: Runheng Liu, Jincheng Xie, Wen Hu, Xingchen Xiao, Heyan Huang

摘要

arXiv:2606.05742v1 Announce Type: new Abstract: Speculative decoding accelerates generation by verifying multiple drafted tokens in a single target-model forward pass, reducing sequential decoding iterations. Model-free variants avoid auxiliary draft models by reusing text and model states already available during generation, but their speedup depends on the reliability of the constructed drafts. We identify two limitations of existing reuse-based methods: lexically anchored retrieval has limited recall under surface-form variation, and deterministic span copying can be brittle when the retrieved context does not uniquely determine the continuation. We propose \emph{AdaPLD}, a training-free method that adaptively improves both retrieval and draft construction. AdaPLD preserves high-precision lexical reuse while using semantic similarity to recover additional reuse opportunities when lexical matching fails.

AdaPLD: Adaptive Retrieval and Reuse for Efficient Model-Free Speculative Decoding 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (2)