Soften the Mask: Adaptive Temporal Soft Mask for Efficient Dynamic Facial Expression Recognition 文章

ArXiv CS.CV2026-05-29NEWSen作者: Meng-zhu Li, Quanxing Zha, Hongjun Wu

摘要

arXiv:2502.21004v2 Announce Type: replace Abstract: Dynamic Facial Expression Recognition (DFER) facilitates the understanding of psychological intentions through non-verbal communication. Existing methods struggle to manage irrelevant information, such as background noise and redundant semantics, which impacts both efficiency and effectiveness. In this work, we propose a novel supervised temporal soft masked autoencoder network for DFER, namely AdaTosk, which integrates a parallel supervised classification branch with the self-supervised reconstruction branch. The self-supervised reconstruction branch applies random binary hard mask to generate diverse training samples, encouraging meaningful feature representations in visible tokens. Meanwhile the classification branch employs an adaptive temporal soft mask to flexibly mask visible tokens based on their temporal significance.