WhiFlash: Accelerating Speculative Decoding with Token-Level Cross-Paradigm Routing 事件

PRODUCT_LAUNCH2026-06-09影响: MEDIUM

WhiFlash: Accelerating Speculative Decoding with Token-Level Cross-Paradigm Routing arXiv:2606.07710v1 Announce Type: cross Abstract: The autoregressive nature of large language models (LLMs) remains a significant bottleneck for inference, particularly in complex agentic workloads. While speculative decoding (SD) accelerates inference, current approaches rely on static drafting paradigms, utilising either autoregressive drafting models for reasoning or diffusion-based parallel drafting models f

WhiFlash: Accelerating Speculative Decoding with Token-Level Cross-Paradigm Routing · 相关产品