Plan-and-Verify Video Reward Reasoning with Spatio-Temporal Scene Graph Grounding 事件

PRODUCT_LAUNCH2026-06-11影响: MEDIUM

Plan-and-Verify Video Reward Reasoning with Spatio-Temporal Scene Graph Grounding arXiv:2606.11838v1 Announce Type: new Abstract: Reward models for text-to-video (T2V) generation guide post-training but often fail at fine-grained semantic alignment. We trace this to two structural weaknesses in existing reasoning-based reward models: they do not systematically verify every condition described in the prompt, and the visual evidence supporting each judgment remains implicit in their free-form rea