Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs 文章

ArXiv CS.CV2026-05-28NEWSen作者: Yahan Yu, Yuyang Dong, Masafumi Oyamada

摘要

arXiv:2507.06999v2 Announce Type: replace Abstract: Reasoning is essential for large language models (LLMs), especially in complex tasks such as mathematical problem solving. However, multimodal reasoning still faces challenges in modality alignment and training scalability, as many existing methods rely on additional annotations or complex rule-based rewards. To address these issues, we propose the Deliberate-to-Intuitive reasoning framework (D2I), which improves the understanding and reasoning abilities of multimodal LLMs (MLLMs) without extra annotations or complex rewards. During training, D2I uses deliberate reasoning strategies supervised only by rule-based format rewards to enhance modality alignment. During inference, it shifts to intuitive reasoning by removing these explicit strategies, allowing the model to implicitly apply the acquired abilities in its responses.