Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework 论文

2015Proceedings of the AAAI Conference on Artificial Intelligence引用 319
Multimodal Machine Learning ApplicationsVideo Analysis and SummarizationGenerative Adversarial Networks and Image Synthesis

Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework · 相关技术