Multilevel Language and Vision Integration for Text-to-Clip Retrieval 论文
2019Proceedings of the AAAI Conference on Artificial Intelligence引用 330
Multimodal Machine Learning ApplicationsHuman Pose and Action RecognitionVideo Analysis and Summarization