A Central Limit Theorem for $k$-Means Clustering 论文

1982The Annals of Probability引用 241
Bayesian Methods and Mixture ModelsStatistical Methods and InferenceAdvanced Clustering Algorithms Research

摘要

A set of $n$ points in Euclidean space is partitioned into the $k$ groups that minimize the within groups sum of squares. Under the assumption that the $n$ points come from independent sampling on a fixed distribution, conditions are found to assure asymptotic normality of the vector of means of the $k$ groups. The method of proof makes novel application of a functional central limit theorem for empirical processes--a generalization of Donsker's theorem due to Dudley.