HiERO-StepG @ Ego4D Step Grounding Challenge: hierarchical activity understanding enables zero-shot step grounding 文章

ArXiv CS.CV2026-06-01NEWSen作者: Andrea Zenotto, Simone Alberto Peirone, Francesca Pistilli, Giuseppe Averta

摘要

arXiv:2605.31227v1 Announce Type: new Abstract: Procedural activities follow well-defined structures: whether we consider a cooking recipe or a mechanic repairing a car, these activities naturally decompose in a hierarchy of steps and sub-steps. Traditional approaches for step grounding require extensive annotations and scale poorly. Instead, we argue that such hierarchical structure can emerge naturally from uncurated videos of human activities through recurring patterns of co-occurring actions and activities. Our approach builds on HiERO, a weakly-supervised representation learning approach that maps close in the feature space actions that are functionally related to each other, leveraging only fine-grained action-level narrations. In this feature space, procedure steps can be detected by a simple clustering, with no additional task-specific fine-tuning.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据