Beyond Surrogate Gradients: Fully Differentiable Token Pruning for Vision-Language Models 文章

ArXiv CS.CV2026-05-28NEWSen作者: Landi He, Mingde Yao, Shawn Young, Lijian Xu

摘要

arXiv:2605.28051v1 Announce Type: new Abstract: Visual token pruning reduces the computational cost of Vision-Language Models (VLMs) by removing redundant visual tokens. Existing methods typically rely on Gumbel-Softmax to approximate discrete selection during training. However, the optimization is driven by surrogate gradients rather than the true selection process, leading to unreliable learning of token importance. In this paper, we propose DiffPrune, which reformulates pruning as continuous control of token information instead of discrete selection learning. Specifically, we introduce an Information Throttler that modulates each token using variance-preserving noise conditioned on importance scores, where higher scores induce less information suppression during training. This design directly operates on token representations, naturally providing a fully differentiable optimization path for learning token importance.