Convex Formulation for Learning from Positive and Unlabeled Data 论文

2015International Conference on Machine Learning引用 237

Machine Learning and Data ClassificationMachine Learning and AlgorithmsDomain Adaptation and Few-Shot Learning

人工智能 Domain Adaptation and Few-Shot Learning Machine Learning and Data Classification Machine Learning and Algorithms

作者

摘要

We discuss binary classification from only positive and unlabeled data (PU classification), which is conceivable in various real-world machine learning problems. Since unlabeled data consists of both positive and negative data, simply separating positive and unlabeled data yields a biased solution. Recently, it was shown that the bias can be canceled by using a particular non-convex loss such as the ramp loss. However, classifier training with a non-convex loss is not straightforward in practice. In this paper, we discuss a convex formulation for PU classification that can still cancel the bias. The key idea is to use different loss functions for positive and unlabeled samples. However, in this setup, the hinge loss is not permissible. As an alternative, we propose the double hinge loss. Theoretically, we prove that the estimators converge to the optimal solutions at the optimal parametric rate. Experimentally, we demonstrate that PU classification with the double hinge loss performs as accurate as the non-convex method, with a much lower computational cost.

作者查看全部 (3)

Masashi Sugiyama

Gang Niu

Marthinus Du Plessis

Convex Formulation for Learning from Positive and Unlabeled Data 论文

详细信息

摘要

作者查看全部 (3)

相关技术查看全部 (3)

相关事件

相关文章