Visual Verification Enables Inference-time Steering and Autonomous Policy Improvement 文章

ArXiv CS.AI2026-06-17NEWSen作者: Mingtong Zhang, Dhruv Shah

详细信息

来源站点: ArXiv CS.AI
作者: Mingtong Zhang, Dhruv Shah
文章类型: NEWS
语言: en
发布日期: 2026-06-17

摘要

arXiv:2606.18247v1 Announce Type: cross Abstract: Robots deployed in the real world should learn from their experience and improve over time. This requires a mechanism of practicing and learning from feedback. In this paper, we propose VERITAS, a generator-verifier framework for generalist robot policies for inference-time policy steering and self-improvement. We use a pre-trained generalist robot policy as a ``generator'' and pair it with a gradient-free ``visual verifier'' that evaluates actions at inference time. This framework enables inference-time steering that improves policy performance without additional training. We demonstrate that inference-time verification consistently outperforms vanilla generalists without training on additional demonstration data. Additionally, we demonstrate that the verified rollouts provide effective supervision for offline policy improvement: policies fine-tuned on verified self-generated trajectories achieve consistent performance gains.

Visual Verification Enables Inference-time Steering and Autonomous Policy Improvement 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (2)

相关技术