See, Plan, Rewind: Progress-Aware Vision-Language-Action Models for Robust Robotic Manipulation 文章

ArXiv CS.CV2026-06-02NEWSen作者: Tingjun Dai, Mingfei Han, Tingwen Du, Zhiheng Liu, Zihao Zhang, Zhihui Li, Salman Khan, Jun Yu, Xiaojun Chang

详细信息

来源站点
ArXiv CS.CV
作者
Tingjun Dai, Mingfei Han, Tingwen Du, Zhiheng Liu, Zihao Zhang, Zhihui Li, Salman Khan, Jun Yu, Xiaojun Chang
文章类型
NEWS
语言
en
发布日期
2026-06-02

摘要

arXiv:2603.09292v2 Announce Type: replace-cross Abstract: Measurement of task progress through explicit, actionable milestones is critical for robust robotic manipulation. This progress awareness enables a model to ground its current task status, anticipate verifiable intermediate states, and detect and recover from failures when progress stalls. To embody this capability, we introduce \textbf{S}ee, \textbf{P}lan, \textbf{R}ewind (SPR), a progress-aware vision-language-action framework that dynamically grounds language instructions into a sequence of spatial subgoals. SPR operates through a continuous core cycle, Seeing the current state and upcoming milestone, Planning a trajectory towards the next 2D waypoint, and Rewinding to a recoverable state upon failure by monitoring progress against the expected sequence. This closed-loop approach enables robust error correction without requiring additional training data or auxiliary models.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据

相关技术

暂无数据