CoSTL: Comprehensive Spatial-Temporal Representation Learning for Moment Retrieval and Highlight Detection 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

CoSTL: Comprehensive Spatial-Temporal Representation Learning for Moment Retrieval and Highlight Detection arXiv:2606.01149v1 Announce Type: new Abstract: Video Moment Retrieval (MR) and Highlight Detection (HD) are crucial tasks in video analysis that aim to localize specific moments and estimate clip-wise relevance based on a given text query. Recent approaches treat them as similar video grounding tasks and use the same architecture to solve them. These tasks require both fine-grained compre