OccamToken: Efficient VLM Inference with Training-Free and Budget-Adaptive Token Pruning 事件

PRODUCT_LAUNCH2026-05-29影响: MEDIUM

OccamToken: Efficient VLM Inference with Training-Free and Budget-Adaptive Token Pruning arXiv:2605.29657v1 Announce Type: new Abstract: Vision-language models (VLMs) rely on long visual token sequences for visual understanding, making the prefill stage expensive in both computation and memory. Most existing pruning methods follow an absolute-ranking paradigm, assigning importance scores to visual tokens and retaining a fixed top-K subset. In this work, we argue that this paradigm is fundamenta

OccamToken: Efficient VLM Inference with Training-Free and Budget-Adaptive Token Pruning · 相关技术