RAVE: Re-Allocating Visual Attention in Large Multimodal Models 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

RAVE: Re-Allocating Visual Attention in Large Multimodal Models arXiv:2605.18359v2 Announce Type: replace Abstract: Large multimodal models (LMMs) inherit the self-attention mechanism of pretrained language backbones, yet standard attention can exhibit suboptimal allocation, including cross-modal misallocation between textual and visual evidence and intra-visual imbalance among visual tokens. We propose RAVE (Re-Allocating Visual Attention), a lightweight pair-gating mechanism that adds a learn