Semi-supervised Feature Selection via Spectral Analysis 论文

2007引用 300
Face and Expression RecognitionMachine Learning and Data ClassificationText and Document Classification Technologies

摘要

Feature selection is an important task in effective data mining. A new challenge to feature selection is the so-called “small labeled-sample problem” in which labeled data is small and unlabeled data is large. The paucity of labeled instances provides insufficient information about the structure of the target concept, and can cause supervised feature selection algorithms to fail. Unsupervised feature selection algorithms can work without labeled data. However, these algorithms ignore label information, which may lead to performance deterioration. In this work, we propose to use both (small) labeled and (large) unlabeled data in feature selection, which is a topic has not yet been addressed in feature selection research. We present a semi-supervised feature selection algorithm based on spectral analysis. The algorithm exploits both labeled and unlabeled data through a regularization framework, which provides an effective way to address the “small labeled-sample” problem. Experimental results demonstrated the efficacy of our approach and confirmed that small labeled samples can help feature selection with unlabeled data.