Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval 文章

ArXiv CS.CV2026-05-26NEWSen作者: Xiang Fang, Daizong Liu, Pan Zhou, Yuchong Hu

摘要

arXiv:2209.11572v3 Announce Type: replace Abstract: As an increasingly popular task in multimedia information retrieval, video moment retrieval (VMR) aims to localize the target moment from an untrimmed video according to a given language query. Most previous methods depend heavily on numerous manual annotations (i.e., moment boundaries), which are extremely expensive to acquire in practice. In addition, due to the domain gap between different datasets, directly applying these pre-trained models to an unseen domain leads to a significant performance drop. In this paper, we focus on a novel task: cross-domain VMR, where fully-annotated datasets are available in one domain (``source domain''), but the domain of interest (``target domain'') only contains unannotated datasets. As far as we know, we present the first study on cross-domain VMR.

Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval 文章

摘要

相关事件查看全部 (2)

相关公司

相关人物

相关产品

相关技术查看全部 (1)