CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating 事件

PRODUCT_LAUNCH2026-05-29影响: MEDIUM

CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating arXiv:2605.11723v2 Announce Type: replace Abstract: In this paper, we propose Concentrate and Concentrate (CaC), a coarse-to-fine anomaly reward model based on Vision-Language Models. During inference, it first conducts a global temporal scan to anchor anomalous time windows, then performs fine-grained spatial grounding within the localized interval, and finally derives robust judgments via structured spatiotempora