Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models 论文

2016International Journal of Computer Vision引用 288
Multimodal Machine Learning ApplicationsTopic ModelingNatural Language Processing Techniques