Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation 事件
PRODUCT_LAUNCH2026-06-05影响: MEDIUM
Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation arXiv:2605.15913v4 Announce Type: replace Abstract: Block attention, which processes the input as separate blocks that cannot attend to one another, offers significant potential to improve KV cache reuse in long-context scenarios such as Retrieval-Augmented Generation (RAG). However, its broader application is hindered by two key challenges: the difficulty of segmenting input text into meaningful, self-c