Learning the k in k-means 论文

2003引用 620
Advanced Clustering Algorithms ResearchFace and Expression RecognitionData Mining Algorithms and Applications

摘要

When clustering a dataset, the right number $k$ of clusters to use\nis often not obvious, and choosing k automatically is a hard algorithmic\nproblem. In this paper we present a new algorithm for choosing k that is based\non a new statistical test for the hypothesis that a subset of data follows a\nGaussian distribution. The algorithm runs k-means with increasing k until the\ntest fails to reject the hypothesis that the data assigned to each k-means\ncenter are Gaussian. We present results from experiments on synthetic and\nreal-world data showing that the algorithm works well, and better than a recent\nmethod based on the BIC penalty for model complexity.Pre-2018 CSE ID: CS2002-0716