Handling class imbalance problem using oversampling techniques: A review 论文
摘要
The objective of classifier is to classify objects of a data set into one or more classes based on its characteristics. In real life applications, classifiers are applied on data sets which are unbalanced i.e. some classes having very less number of instances known as minority classes as compared to other classes known as majority classes. Classification algorithms are highly accurate for the majority classes but significantly less accurate for the minority classes. Unbalanced data sets have a negative effect on classification performance of traditional classification algorithms. Analyzing such problem is called class imbalance problem. To solve Class Imbalance Problem different techniques have been proposed at the Data level, Algorithm level and at the Hybrid level. Most commonly used data balancing techniques are over and under sampling for handling the class imbalance problem. In our paper we compare various oversampling techniques which are SMOTE (Synthetic minority oversampling approach), ADASYN, Borderline-SMOTE, Safe-Level SMOTE by applying different classifiers to the problem and observing various performance metrics.