KNN Classifier with Self Adjusting Memory for Heterogeneous Concept Drift 论文
摘要
Data Mining in non-stationary data streams is gaining more attentionrecently, especially in the context of Internet of Things and Big Data. It is a highly challenging task, since the fundamentally different typesof possibly occurring drift undermine classical assumptions such asi.i.d. data or stationary distributions. Available algorithms are either struggling with certain forms of drift or require a priori knowledge in terms of a task specific setting. We propose the Self Adjusting Memory (SAM) model for the k Nearest Neighbor (kNN) algorithm since kNN constitutes a proven classifier within the streaming setting. SAM-kNN can deal with heterogeneous concept drift, i.e different drift types and rates, using biologically inspiredmemory models and their coordination. It can be easilyapplied in practice since an optimization of the meta parameters is not necessary. The basic idea is to construct dedicated models for thecurrent and former concepts and apply them according tothe demands of the given situation. An extensive evaluation on various benchmarks, consisting of artificial streamswith known drift characteristics as well as real world datasets is conducted. Thereby, we explicitly add new benchmarks enabling a precise performance evaluation on multiple types of drift. The highly competitive results throughout all experiments underline the robustness of SAM-kNN as well as its capabilityto handle heterogeneous concept drift.