Adaptive Dense Evidence Refinement for Video Relational Reasoning for VRR-QA Challenge 文章

ArXiv CS.CV2026-06-02NEWSen作者: Yuyang Sun, Yongliang Wu, Xingyu Zhu, Yuxia Chen, Zhenxiang Jiang, Yangguang Ji, Wenbo Zhu, Yanxi Shi, Jay Wu, Shuo Wang, Xu Yang

查看原文 →

关系图谱

摘要

arXiv:2606.01104v1 Announce Type: new Abstract: VRR-QA evaluates whether video-language systems can infer spatial, temporal, viewpoint, depth, and visibility relations that are not always resolved by a single frame. We present an inference-only system built around adaptive test-time computation. The system first answers each question with a direct video-language model pass, then uses multiple lightweight views to find unstable questions. Only these difficult questions are routed to a high-budget dense evidence module that constructs timestamped frame observations, relation-specific probes, candidate verification, and conservative temporal aggregation. This design separates two problems that are often confused in video question answering: finding plausible alternative answers and deciding when a current answer should actually be changed. On the test split, the final system obtains 90.07 average accuracy and 87.81 macro average accuracy.

Adaptive Dense Evidence Refinement for Video Relational Reasoning for VRR-QA Challenge 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术