Electronic Journal of Statistics

Statistical properties of convex clustering

Kean Ming Tan and Daniela Witten

Full-text: Open access


In this manuscript, we study the statistical properties of convex clustering. We establish that convex clustering is closely related to single linkage hierarchical clustering and $k$-means clustering. In addition, we derive the range of the tuning parameter for convex clustering that yields a non-trivial solution. We also provide an unbiased estimator of the degrees of freedom, and provide a finite sample bound for the prediction error for convex clustering. We compare convex clustering to some traditional clustering methods in simulation studies.

Article information

Electron. J. Statist., Volume 9, Number 2 (2015), 2324-2347.

Received: March 2015
First available in Project Euclid: 14 October 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Degrees of freedom fusion penalty hierarchical clustering $k$-means prediction error single linkage


Tan, Kean Ming; Witten, Daniela. Statistical properties of convex clustering. Electron. J. Statist. 9 (2015), no. 2, 2324--2347. doi:10.1214/15-EJS1074. https://projecteuclid.org/euclid.ejs/1444828331

Export citation


  • Bach, F., Jenatton, R., Mairal, J. and Obozinski, G. (2011). Convex optimization with sparsity-inducing norms., Optimization for Machine Learning 19–53.
  • Boucheron, S., Lugosi, G. and Massart, P. (2013)., Concentration Inequalities: a Nonasymptotic Theory of Independence. OUP Oxford.
  • Boyd, S. and Vandenberghe, L. (2004)., Convex Optimization. Cambridge university press.
  • Chen, J. and Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces., Biometrika 95 759–771.
  • Chen, J. and Chen, Z. (2012). Extended BIC for small-$n$-large-$P$ sparse GLM., Statistica Sinica 22 555.
  • Chi, E. C., Allen, G. I. and Baraniuk, R. G. (2014). Convex biclustering., arXiv preprint arXiv:1408.0856.
  • Chi, E. and Lange, K. (2014a). Splitting methods for convex clustering., Journal of Computational and Graphical Statistics. in press.
  • Chi, E. and Lange, K. (2014b). cvxclustr: Splitting methods for convex clustering, http://cran.r-project.org/web/packages/cvxclustr. R package version 1.1.1.
  • Duchi, J. and Singer, Y. (2009). Efficient online and batch learning using forward backward splitting., The Journal of Machine Learning Research 10 2899–2934.
  • Efron, B. (1986). How biased is the apparent error rate of a prediction rule?, Journal of the American Statistical Association 81 461–470.
  • Hanson, D. L. and Wright, F. T. (1971). A bound on tail probabilities for quadratic forms in independent random variables., The Annals of Mathematical Statistics 42 1079–1083.
  • Haris, A., Witten, D. and Simon, N. (2015). Convex modeling of interactions with strong heredity., Journal of Computational and Graphical Statistics. in press.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2009)., The Elements of Statistical Learning; Data Mining, Inference and Prediction. Springer Verlag, New York.
  • Hocking, T. D., Joulin, A., Bach, F., Vert, J.-P. et al. (2011). Clusterpath: an algorithm for clustering using convex fusion penalties. In, 28th International Conference on Machine Learning.
  • Jain, A. K. and Dubes, R. C. (1988)., Algorithms for Clustering Data. Prentice-Hall.
  • Lindsten, F., Ohlsson, H. and Ljung, L. (2011). Clustering using sum-of-norms regularization: with application to particle filter output computation. In, Statistical Signal Processing Workshop (SSP) 201–204. IEEE.
  • Liu, J., Yuan, L. and Ye, J. (2013). Guaranteed sparse recovery under linear transformation. In, Proceedings of the 30th International Conference on Machine Learning (ICML-13) 91–99.
  • Lloyd, S. (1982). Least squares quantization in PCM., IEEE Transactions on Information Theory 28 129–137.
  • Ng, A. Y., Jordan, M. I. and Weiss, Y. (2002). On spectral clustering: analysis and an algorithm., Advances in Neural Information Processing Systems.
  • Pelckmans, K., De Brabanter, J., Suykens, J. and De Moor, B. (2005). Convex clustering shrinkage. In, PASCAL Workshop on Statistics and Optimization of Clustering Workshop.
  • Radchenko, P. and Mukherjee, G. (2014). Consistent clustering using $\ell_1$ fusion penalty., arXiv preprint arXiv:1412.0753.
  • Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods., Journal of the American Statistical association 66 846–850.
  • Schwarz, G. (1978). Estimating the dimension of a model., The Annals of Statistics 6 461–464.
  • Tan, K. M. and Witten, D. M. (2014). Sparse biclustering of transposable data., Journal of Computational and Graphical Statistics 23 985–1008.
  • Tibshirani, R. J. and Taylor, J. (2011). The solution path of the generalized lasso., The Annals of Statistics 39 1335–1371.
  • Tibshirani, R. J. and Taylor, J. (2012). Degrees of freedom in lasso problems., The Annals of Statistics 40 1198–1232.
  • Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 91–108.
  • Vaiter, S., Deledalle, C.-A., Peyré, G., Fadili, J. M. and Dossal, C. (2014). The degrees of freedom of partly smooth regularizers., arXiv preprint arXiv:1404.5557.
  • Witten, D. M. and Tibshirani, R. (2010). A framework for feature selection in clustering., Journal of the American Statistical Association 105 713–726.
  • Zhu, C., Xu, H., Leng, C. and Yan, S. (2014). Convex optimization procedure for clustering: theoretical revisit. In, Advances in Neural Information Processing Systems.