ToaSt: Token Channel Selection and Structured Pruning for Efficient ViT 文章

ArXiv CS.CV2026-06-16NEWSen作者: Hyunchan Moon, Cheonjun Park, Steven L. Waslander

详细信息

来源站点
ArXiv CS.CV
作者
Hyunchan Moon, Cheonjun Park, Steven L. Waslander
文章类型
NEWS
语言
en
发布日期
2026-06-16

摘要

arXiv:2602.15720v3 Announce Type: replace Abstract: Vision Transformers (ViTs) have achieved remarkable success across various vision tasks, yet their deployment is often hindered by prohibitive computational costs. While structured weight pruning and token compression have emerged as promising solutions, they suffer from prolonged retraining and inter-layer dependencies that complicate optimization, respectively. We propose ToaSt, a decoupled framework applying specialized strategies to distinct ViT components. We apply coupled head-wise structured pruning to Multi-Head Self-Attention modules, leveraging attention operation characteristics to enhance robustness. For Feed-Forward Networks (over 60% of FLOPs), we introduce Token Channel Selection (TCS), a training-free method that filters redundant noise channels at inference time.