ConTrans: Learning Text-enhanced Local-global Temporal Representations for Zero-shot Temporal Action Localization 事件
PRODUCT_LAUNCH2026-06-01影响: MEDIUM
ConTrans: Learning Text-enhanced Local-global Temporal Representations for Zero-shot Temporal Action Localization arXiv:2605.30689v1 Announce Type: new Abstract: Zero-shot Temporal Action Localization (ZS-TAL) aims to detect and locate previously unseen actions in untrimmed videos. However, existing approaches primarily focus on modeling long-range contextual information, often neglecting the critical relative-offset-based local correlations between video frames. Furthermore, their performance