Attend to Evidence: Evidence-Anchored Spatial Attention Supervision for Multimodal RLVR 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

Attend to Evidence: Evidence-Anchored Spatial Attention Supervision for Multimodal RLVR arXiv:2605.30912v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) improves vision-language models (VLMs) by optimizing outcome rewards derived from final answers. However, such outcome-only rewards do not tell the model which image regions justify an answer. For questions that require visual grounding, these rewards cannot distinguish responses supported by relevant visual

Attend to Evidence: Evidence-Anchored Spatial Attention Supervision for Multimodal RLVR · 相关技术