Penalized Model-Based Clustering with Application to Variable Selection 论文
2007引用 282
Gene expression and cancer classificationBayesian Methods and Mixture ModelsBioinformatics and Genomic Networks
摘要
Variable selection in clustering analysis is both challenging and important. In the context of modelbased clustering analysis with a common diagonal covariance matrix, which is especially suitable for “high dimension, low sample size ” settings, we propose a penalized likelihood approach with an L1 penalty function, automatically realizing variable selection via thresholding and delivering a sparse solution. We derive an EM algorithm to fit our proposed model, and propose a modified BIC as a model selection criterion to choose the number of components and the penalization parameter. A simulation study and an application to gene function prediction with gene expression profiles demonstrate the utility of our method.