Continuous Reasoning for Vision-Language-Action 文章

ArXiv CS.AI2026-06-02NEWSen作者: Yueh-Hua Wu, Tatsuya Matsushima, Kei Ota

摘要

arXiv:2606.00229v1 Announce Type: cross Abstract: Natural language is a powerful reasoning medium for language and vision-language models, but it is mismatched to the granularity of continuous control. Text and explicit subgoals operate at task-level granularity, whereas vision-language-action (VLA) policies must choose actions at a much finer temporal scale; a single reasoning step can therefore span many action chunks while remaining only weakly coupled to the action needed now. This suggests a different question for VLA: what should play the role of language? We argue that a useful VLA reasoning medium must be shareable across model instances, verifiable through downstream action improvement, and aligned with temporally extended control structure. Based on this view, we propose Continuous Reasoning for Vision-Language-Action.

Continuous Reasoning for Vision-Language-Action 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (2)