Can a Machine Generate Humanlike Language Descriptions for a Remote Sensing Image? 论文

2017IEEE Transactions on Geoscience and Remote Sensing引用 222

Multimodal Machine Learning ApplicationsAdvanced Image and Video Retrieval TechniquesDomain Adaptation and Few-Shot Learning

人工智能 Domain Adaptation and Few-Shot Learning Advanced Image and Video Retrieval Techniques Multimodal Machine Learning Applications

关系图谱

作者

摘要

This paper investigates an intriguing question in the remote sensing field: “can a machine generate humanlike language descriptions for a remote sensing image?” The automatic description of a remote sensing image (namely, remote sensing image captioning) is an important but rarely studied task for artificial intelligence. It is more challenging as the description must not only capture the ground elements of different scales, but also express their attributes as well as how these elements interact with each other. Despite the difficulties, we have proposed a remote sensing image captioning framework by leveraging the techniques of the recent fast development of deep learning and fully convolutional networks. The experimental results on a set of high-resolution optical images including Google Earth images and GaoFen-2 satellite images demonstrate that the proposed method is able to generate robust and comprehensive sentence description with desirable speed performance.

作者查看全部 (2)

Zhengxia Zou

Zhenwei Shi

Can a Machine Generate Humanlike Language Descriptions for a Remote Sensing Image? 论文

摘要

作者查看全部 (2)

相关技术查看全部 (3)

相关事件

相关文章