Distill-then-Replace: Efficient Task-Specific Hybrid Attention Model Construction 事件

BREAKTHROUGH2026-06-03影响: HIGH

Distill-then-Replace: Efficient Task-Specific Hybrid Attention Model Construction arXiv:2601.11667v2 Announce Type: replace-cross Abstract: Transformer architectures deliver state-of-the-art accuracy via dense full-attention, but their quadratic time and memory complexity with respect to sequence length limits practical deployment. Linear attention mechanisms offer linear or near-linear scaling yet often incur performance degradation. Hybrid models that integrate full and linear attention layer