MAGIC: Multimodal Alignment & Grounding-aware Instruction Coreset for Vision-Language Models 事件

SHUTDOWN2026-05-26影响: LOW

MAGIC: Multimodal Alignment & Grounding-aware Instruction Coreset for Vision-Language Models arXiv:2605.26004v1 Announce Type: new Abstract: Instruction tuning of large vision-language models (LVLMs) increasingly depends on massive multimodal corpora, yet these datasets contain samples with substantial redundancy, low visual dependency, and highly imbalanced coverage of multimodal reasoning behaviors. As a result, uniform subsampling or naive score-based selection often yields suboptimal traini