Local-Global Video-Text Interactions for Temporal Grounding 论文

2020引用 282
Multimodal Machine Learning ApplicationsHuman Pose and Action RecognitionVideo Analysis and Summarization

Local-Global Video-Text Interactions for Temporal Grounding · 作者