Semi-Supervised Learning in Gigantic Image Collections 论文
摘要
With the advent of the Internet it is now possible to col-lect hundreds of millions of images. These images come with varying degrees of label information. “Clean labels” can be manually obtained on a small fraction, “noisy la-bels ” may be extracted automatically from surrounding text, while for most images there are no labels at all. Semi-supervised learning is a principled framework for combin-ing these different label sources. However, it scales poly-nomially with the number of images, making it impractical for use on gigantic collections with hundreds of millions of images and thousands of classes. In this paper we show how to utilize recent results in ma-chine learning to obtain highly efficient approximations for semi-supervised learning. Specifically, we use the conver-gence of the eigenvectors of the normalized graph Lapla-cian to eigenfunctions of weighted Laplace-Beltrami oper-ators. We combine this with a label sharing framework obtained from Wordnet to propagate label information to classes lacking manual annotations. Our algorithm enables us to apply semi-supervised learning to a database of 80 million images with 74 thousand classes. 1.