Discovering discriminative action parts from mid-level video representations 论文

2012引用 239
Human Pose and Action RecognitionVideo Analysis and SummarizationVideo Surveillance and Tracking Methods

详细信息

发表日期
2012-06-01
发表年份
2012

关键词

Human Pose and Action RecognitionVideo Analysis and SummarizationVideo Surveillance and Tracking Methods

摘要

We describe a mid-level approach for action recognition. From an input video, we extract salient spatio-temporal structures by forming clusters of trajectories that serve as candidates for the parts of an action. The assembly of these clusters into an action class is governed by a graphical model that incorporates appearance and motion constraints for the individual parts and pairwise constraints for the spatio-temporal dependencies among them. During training, we estimate the model parameters discriminatively. During classification, we efficiently match the model to a video using discrete optimization. We validate the model's classification ability in standard benchmark datasets and illustrate its potential to support a fine-grained analysis that not only gives a label to a video, but also identifies and localizes its constituent parts.