HanDyVQA: A Video QA Benchmark for Fine-Grained Hand-Object Interaction Dynamics 文章

ArXiv CS.CV2026-06-16NEWSen作者: Masatoshi Tateno, Gido Kato, Hirokatsu Kataoka, Yoichi Sato, Takuma Yagi

详细信息

来源站点
ArXiv CS.CV
作者
Masatoshi Tateno, Gido Kato, Hirokatsu Kataoka, Yoichi Sato, Takuma Yagi
文章类型
NEWS
语言
en
发布日期
2026-06-16

摘要

arXiv:2512.00885v2 Announce Type: replace Abstract: Hand-object interaction (HOI) inherently involves dynamics where human manipulations produce distinct spatio-temporal effects on objects. However, existing semantic HOI benchmarks focused either on manipulation or on the resulting effects at a coarse level, lacking fine-grained spatio-temporal reasoning to capture the underlying dynamics in HOI. We introduce HanDyVQA, a fine-grained video question-answering benchmark that comprehensively covers both the manipulation and effect aspects of HOI. HanDyVQA comprises six complementary question types (Action, Process, Objects, Location, State Change, and Object Parts), totalling 11.1K multiple-choice QA pairs. Collected QA pairs recognizing manipulation styles, hand/object motions, and part-level state changes. HanDyVQA also includes 10.3K segmentation masks for Objects and Object Parts questions, enabling the evaluation of object/part-level reasoning in video object segmentation.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据