Once-For-All: A Train-Once and Select-Anytime Framework for Multimodal Instruction Tuning 文章

ArXiv CS.CV2026-05-27NEWSen作者: Mingkang Dong, Hongyi Cai, Xiwen Lei, Jie Li, Tao Zhang, Muxin Pu

摘要

arXiv:2605.26761v1 Announce Type: new Abstract: Multimodal instruction tuning is the de facto recipe for adapting vision language models (VLMs), yet instruction data are highly redundant, making data selection critical for training efficiency. Existing methods derive selection signals from a specific model or dataset, so whenever the target model or candidate pool changes, the criteria must be recomputed from scratch at substantial cost. To address this, we propose OFA, a data selection framework that trains a reusable selector once and applies it to any dataset or model without recomputation. OFA clusters multimodal instructions in a frozen CLIP space, derives pseudo labels from the cluster structure, and trains a lightweight selector for only a few epochs; samples on which this selector is least confident are selected as the most informative. Once trained, the frozen selector transfers directly across datasets and model scales.

Once-For-All: A Train-Once and Select-Anytime Framework for Multimodal Instruction Tuning 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (5)

相关人物

相关产品查看全部 (8)

相关技术查看全部 (22)