Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions 论文

2019引用 217
Multimodal Machine Learning ApplicationsAdvanced Image and Video Retrieval TechniquesHuman Pose and Action Recognition

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions · 作者