MAGE: All-[MASK] Block Already Knows Where to Look in Block Diffusion LLM 事件
PRODUCT_LAUNCH2026-06-08影响: MEDIUM
MAGE: All-[MASK] Block Already Knows Where to Look in Block Diffusion LLM arXiv:2602.14209v2 Announce Type: replace-cross Abstract: Block diffusion LLMs are an emerging paradigm for parallel language generation, but their KV caching makes memory access the dominant bottleneck in long-context inference. Sparse attention, which attends only to a small KV subset per query, can reduce this latency with minimal accuracy loss. In block diffusion, however, the B tokens of each block must share a singl