Elastic ViTs from Pretrained Models without Retraining 文章

ArXiv CS.CV2026-06-01NEWSen作者: Walter Simoncini, Michael Dorkenwald, Tijmen Blankevoort, Cees G. M. Snoek, Yuki M. Asano

查看原文 →

关系图谱

摘要

arXiv:2510.17700v2 Announce Type: replace Abstract: Vision foundation models achieve remarkable performance but are only available in a limited set of pre-determined sizes, forcing sub-optimal deployment choices under real-world constraints. We introduce SnapViT: Single-shot network approximation for pruned Vision Transformers, a new post-pretraining structured pruning method that enables elastic inference across a continuum of compute budgets. Our approach efficiently combines gradient information with cross-network structure correlations, approximated via an evolutionary algorithm, does not require labeled data, generalizes to models without a classification head, and is retraining-free. Experiments on DINO, SigLIPv2, DeIT, and AugReg models demonstrate superior performance over state-of-the-art methods across various sparsities, requiring less than five minutes on a single A100 GPU to generate elastic models that can be adjusted to any computational budget.

Elastic ViTs from Pretrained Models without Retraining 文章

摘要

相关事件查看全部 (2)

相关公司

相关人物

相关产品查看全部 (16)

相关技术查看全部 (2)