HiSpec: Hierarchical Speculative Decoding for LLMs 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

HiSpec: Hierarchical Speculative Decoding for LLMs arXiv:2510.01336v2 Announce Type: replace Abstract: Speculative decoding accelerates LLM inference by using a smaller draft model to speculate tokens that a larger target model verifies. Verification is often the bottleneck (e.g. verification is $4\times$ slower than token generation when a 3B model speculates for a 70B target model), but most prior works focus only on accelerating drafting. $\textit{``Intermediate"}$ verification reduces verif