Class-Boundary Alignment for Imbalanced Dataset Learning 论文

2003引用 321
Imbalanced Data Classification TechniquesMachine Learning and AlgorithmsText and Document Classification Technologies

摘要

In this paper, we propose the class-boundaryalignment algorithm to augment SVMs to deal with imbalanced training-data problems posed by many emerging applications (e.g., image retrieval, video surveillance, and gene profiling). Through a simple example, we first show that SVMs can be ineffective in determining the class boundary when the training instances of the target class are heavily outnumbered by the nontarget training instances. To remedy this problem, we propose to adjust the class boundary either by transforming the kernel function when the training data can be represented in a vector space, or by modifying the kernel matrix when the data do not have a vector-space representation (e.g., sequence data). Through theoretical analysis and empirical study, we show that the classboundary-alignment algorithm works effectively with images (data that have a vector-space representation) and video sequences (data that do not have a vector-space representation). 1.