HanDyVQA: A Video QA Benchmark for Fine-Grained Hand-Object Interaction Dynamics 文章

ArXiv CS.CV2026-06-16NEWSen作者: Masatoshi Tateno, Gido Kato, Hirokatsu Kataoka, Yoichi Sato, Takuma Yagi

详细信息

来源站点: ArXiv CS.CV
作者: Masatoshi Tateno, Gido Kato, Hirokatsu Kataoka, Yoichi Sato, Takuma Yagi
文章类型: NEWS
语言: en
发布日期: 2026-06-16

摘要

arXiv:2512.00885v2 Announce Type: replace Abstract: Hand-object interaction (HOI) inherently involves dynamics where human manipulations produce distinct spatio-temporal effects on objects. However, existing semantic HOI benchmarks focused either on manipulation or on the resulting effects at a coarse level, lacking fine-grained spatio-temporal reasoning to capture the underlying dynamics in HOI. We introduce HanDyVQA, a fine-grained video question-answering benchmark that comprehensively covers both the manipulation and effect aspects of HOI. HanDyVQA comprises six complementary question types (Action, Process, Objects, Location, State Change, and Object Parts), totalling 11.1K multiple-choice QA pairs. Collected QA pairs recognizing manipulation styles, hand/object motions, and part-level state changes. HanDyVQA also includes 10.3K segmentation masks for Objects and Object Parts questions, enabling the evaluation of object/part-level reasoning in video object segmentation.

HanDyVQA: A Video QA Benchmark for Fine-Grained Hand-Object Interaction Dynamics 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (3)

相关技术查看全部 (2)