From Demonstrations to Rewards: Test-Time Prompt Optimization for VLM Reward Models 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

From Demonstrations to Rewards: Test-Time Prompt Optimization for VLM Reward Models arXiv:2606.00083v1 Announce Type: cross Abstract: Reinforcement learning relies on accurate reward functions, which are often hand-crafted or even unavailable in real-world applications, such as robotics. Recent work has explored the zero-shot reasoning capabilities of pre-trained Vision-Language Models (VLMs) as reward models. However, without careful prompt engineering, these approaches tend to produce subopti