Method for Determining the Optimal Number of Clusters Based on Agglomerative Hierarchical Clustering 论文

2016IEEE Transactions on Neural Networks and Learning Systems引用 220
Advanced Clustering Algorithms ResearchComplex Network Analysis TechniquesText and Document Classification Technologies

摘要

It is crucial to determine the optimal number of clusters for the clustering quality in cluster analysis. From the standpoint of sample geometry, two concepts, i.e., the sample clustering dispersion degree and the sample clustering synthesis degree, are defined, and a new clustering validity index is designed. Moreover, a method for determining the optimal number of clusters based on an agglomerative hierarchical clustering (AHC) algorithm is proposed. The new index and the method can evaluate the clustering results produced by the AHC and determine the optimal number of clusters for multiple types of datasets, such as linear, manifold, annular, and convex structures. Theoretical research and experimental results indicate the validity and good performance of the proposed index and the method.