Learning Human Motion Models for Long-Term Predictions 论文
摘要
We propose a new architecture for the learning of predictive spatio-temporal motion models from data alone. Our approach, dubbed the Dropout Autoencoder LSTM (DAELSTM), is capable of synthesizing natural looking motion sequences over long-time horizons1 without catastrophic drift or motion degradation. The model consists of two components, a 3-layer recurrent neural network to model temporal aspects and a novel autoencoder that is trained to implicitly recover the spatial structure of the human skeleton via randomly removing information about joints during training. This Dropout Autoencoder (DAE) is then used to filter each predicted pose by a 3-layer LSTM network, reducing accumulation of correlated error and hence drift over time. Furthermore to alleviate insufficiency of commonly used quality metric, we propose a new evaluation protocol using action classifiers to assess the quality of synthetic motion sequences. The proposed protocol can be used to assess quality of generated sequences of arbitrary length. Finally, we evaluate our proposed method on two of the largest motion-capture datasets available and show that our model outperforms the state-of-the-art techniques on a variety of actions, including cyclic and acyclic motion, and that it can produce natural looking sequences over longer time horizons than previous methods.