WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation 文章

ArXiv CS.CV2026-05-26NEWSen作者: Kaining Ying, Hengrui Hu, Siyu Ren, Jiamu Li, Fengjiao Chen, Ziwen Wang, Xuezhi Cao, Xunliang Cai, Henghui Ding

查看原文 →

关系图谱

摘要

arXiv:2605.25874v1 Announce Type: new Abstract: Interactive world models are advancing rapidly, yet existing benchmarks cover only part of the required competencies, leaving no unified standard for systematic evaluation. To fill this gap, we introduce WBench, a comprehensive multi-turn benchmark for interactive world model evaluation along five dimensions, namely video quality, setting adherence, interaction adherence, consistency, and physics compliance. WBench contains 289 test cases and 1,058 interaction turns, where each case specifies a world setting and a multi-turn interaction sequence, covering diverse scenes, styles, subjects, and both first- and third-person perspectives, together with four interaction types, including navigation, subject action, event editing, and perspective switching. For navigation, WBench unifies text, 6-DoF pose, and discrete-action control, enabling evaluation of models with different native input interfaces.

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation 文章

摘要

相关事件查看全部 (2)

相关公司

相关人物

相关产品查看全部 (3)

相关技术