CNN-based sensor fusion techniques for multimodal human activity recognition 论文

2017引用 235
Context-Aware Activity Recognition SystemsAnomaly Detection Techniques and ApplicationsHuman Pose and Action Recognition

摘要

Deep learning (DL) methods receive increasing attention within the field of human activity recognition (HAR) due to their success in other machine learning domains. Nonetheless, a direct transfer of these methods is often not possible due to domain specific challenges (e.g. handling of multi-modal sensor data, lack of large labeled datasets). In this paper, we address three key aspects for the future development of robust DL methods for HAR: (1) Is it beneficial to apply data specific normalization? (2) How to optimally fuse multimodal sensor data? (3) How robust are these approaches with respect to available training data? We evaluate convolutional neuronal networks (CNNs) on a new large real-world multimodal dataset (RBK) as well as the PAMAP2 dataset. Our results indicate that sensor specific normalization techniques are required. We present a novel pressure specific normalization method which increases the F1-score by ∼ 4.5 percentage points (pp) on the RBK dataset. Further, we show that late- and hybrid fusion techniques are superior compared to early fusion techniques, increasing the F1-score by up to 3.5 pp (RBK dataset). Finally, our results reveal that in particular CNNs based on a shared filter approach have a smaller dependency on the amount of available training data compared to other fusion techniques.