Attend to Evidence: Evidence-Anchored Spatial Attention Supervision for Multimodal RLVR 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

Attend to Evidence: Evidence-Anchored Spatial Attention Supervision for Multimodal RLVR arXiv:2605.30912v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) improves vision-language models (VLMs) by optimizing outcome rewards derived from final answers. However, such outcome-only rewards do not tell the model which image regions justify an answer. For questions that require visual grounding, these rewards cannot distinguish responses supported by relevant visual