WhiFlash: Accelerating Speculative Decoding with Token-Level Cross-Paradigm Routing 事件

Name: WhiFlash: Accelerating Speculative Decoding with Token-Level Cross-Paradigm Routing
Start: 2026-06-09

PRODUCT_LAUNCH2026-06-09影响: MEDIUM

WhiFlash: Accelerating Speculative Decoding with Token-Level Cross-Paradigm Routing arXiv:2606.07710v1 Announce Type: cross Abstract: The autoregressive nature of large language models (LLMs) remains a significant bottleneck for inference, particularly in complex agentic workloads. While speculative decoding (SD) accelerates inference, current approaches rely on static drafting paradigms, utilising either autoregressive drafting models for reasoning or diffusion-based parallel drafting models f

人工智能

关系图谱