Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words 论文

2006引用 388
Human Pose and Action RecognitionVideo Analysis and SummarizationHuman Motion and Animation

摘要

Imagine a video taken on a sunny beach, can a computer automatically tell what is happening in the scene? Can it identify different human activities in the video, such as water surfing, people walking and lying on the beach? To automatically classify or localize different actions in video sequences is very useful for a variety of tasks, such as video surveillance, objectlevel video summarization, video indexing, digital library organization, etc. However, it remains a challenging task for computers to achieve robust action recognition due to cluttered background, camera motion, occlusion, and geometric and photometric variances of objects. For example, in a live video of a skating competition, the skater moves rapidly across the rink, and the camera also moves to follow the skater. With moving camera, non-stationary background, and moving target, few vision algorithms could identify, categorize and