Open Access
2009 Penalized model-based clustering with unconstrained covariance matrices
Hui Zhou, Wei Pan, Xiaotong Shen
Electron. J. Statist. 3: 1473-1496 (2009). DOI: 10.1214/09-EJS487

Abstract

Clustering is one of the most useful tools for high-dimensional analysis, e.g., for microarray data. It becomes challenging in presence of a large number of noise variables, which may mask underlying clustering structures. Therefore, noise removal through variable selection is necessary. One effective way is regularization for simultaneous parameter estimation and variable selection in model-based clustering. However, existing methods focus on regularizing the mean parameters representing centers of clusters, ignoring dependencies among variables within clusters, leading to incorrect orientations or shapes of the resulting clusters. In this article, we propose a regularized Gaussian mixture model with general covariance matrices, taking various dependencies into account. At the same time, this approach shrinks the means and covariance matrices, achieving better clustering and variable selection. To overcome one technical challenge in estimating possibly large covariance matrices, we derive an E-M algorithm to utilize the graphical lasso (Friedman et al. 2007) for parameter estimation. Numerical examples, including applications to microarray gene expression data, demonstrate the utility of the proposed method.

Citation

Download Citation

Hui Zhou. Wei Pan. Xiaotong Shen. "Penalized model-based clustering with unconstrained covariance matrices." Electron. J. Statist. 3 1473 - 1496, 2009. https://doi.org/10.1214/09-EJS487

Information

Published: 2009
First available in Project Euclid: 4 January 2010

zbMATH: 1326.62143
MathSciNet: MR2578834
Digital Object Identifier: 10.1214/09-EJS487

Subjects:
Primary: 62H30

Keywords: $L_1$ penalization , Covariance estimation , EM algorithm , Gaussian graphical models , High-dimension but low-sample size , normal mixtures , penalized likelihood , semi-supervised learning

Rights: Copyright © 2009 The Institute of Mathematical Statistics and the Bernoulli Society

Back to Top