Electronic Journal of Statistics

Sparse regression with exact clustering

Yiyuan She

Full-text: Open access


This paper studies a generic sparse regression problem with a customizable sparsity pattern matrix, motivated by, but not limited to, a supervised gene clustering problem in microarray data analysis. The clustered lasso method is proposed with the l1-type penalties imposed on both the coefficients and their pairwise differences. Somewhat surprisingly, it behaves differently than the lasso or the fused lasso – the exact clustering effect expected from the l1 penalization is rarely seen in applications. An asymptotic study is performed to investigate the power and limitations of the l1-penalty in sparse regression. We propose to combine data-augmentation and weights to improve the l1 technique. To address the computational issues in high dimensions, we successfully generalize a popular iterative algorithm both in practice and in theory and propose an ‘annealing’ algorithm applicable to generic sparse regressions (including the fused/clustered lasso). Some effective accelerating techniques are further investigated to boost the convergence. The accelerated annealing (AA) algorithm, involving only matrix multiplications and thresholdings, can handle a large design matrix as well as a large sparsity pattern matrix.

Article information

Electron. J. Statist., Volume 4 (2010), 1055-1096.

First available in Project Euclid: 12 October 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62J07: Ridge regression; shrinkage estimators 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20]

Sparsity clustering thresholding lasso


She, Yiyuan. Sparse regression with exact clustering. Electron. J. Statist. 4 (2010), 1055--1096. doi:10.1214/10-EJS578. https://projecteuclid.org/euclid.ejs/1286889184

Export citation


  • [1] Amaldi, E. and Kann, V. (1998), “On the approximability of minimizing non zero variables or unsatisfied relations in linear systems,”, Theoretical Computer Science, 209, 237–260.
  • [2] Anderson, T. (1971), The Statistical Analysis of Time Series, New York: Wiley.
  • [3] Bazaraa, M. S. and Shetty, C. M. (1979), Nonlinear Programming: Theory and Algorithms, John Wiley & Sons.
  • [4] Bertsekas, D. (1999), Nonlinear Programming, Athena Scientific.
  • [5] Bondell, H. D. and Reich, B. J. (2008), “Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR,”, Biometrics, 64, 115–123.
  • [6] Boyd, S. and Vandenberghe, L. (2004), Convex Optimization, Cambridge, MA: Cambridge University Press.
  • [7] Browder, F. E. and Petryshyn, W. V. (1967), “Construction of fixed points of nonlinear mappings in Hilbert space,”, Journal of Mathematical Analysis and Applications, 20, 197–228.
  • [8] Bunea, F., Tsybakov, A. B., and Wegkamp, M. (2007), “Sparsity oracle inequalities for the lasso,”, Electronic Journal of Statistics, 1, 169–194.
  • [9] Candès, E. and Tao, T. (2005), “The Dantzig selector: statistical estimation when p is much smaller than n,”, Annals of Statistics, 35, 2392–2404.
  • [10] Daubechies, I., Defrise, M., and De Mol, C. (2004), “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,”, Communications on Pure and Applied Mathematics, 57, 1413–1457.
  • [11] Dettling, M. and Bühlmann, P. (2004), “Finding Predictive Gene Groups from Microarray Data,”, Journal of Multivariate Analysis, 90, 106–131.
  • [12] Dotson Jr., W. (1970), “On the Mann iterative process,”, Trans. Amer. Math. Soc., 149, 65–73.
  • [13] Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004), “Least Angle Regression,”, Annals of Statistics, 32, 407–499.
  • [14] Friedman, J., Hastie, T., Hofling, H., and Tibshirani, R. (2007), “Pathwise coordinate optimization,”, Annals of Applied Statistics, 1, 302–332.
  • [15] Fu, W. (1998), “Penalized regressions: the bridge vs the lasso,”, JCGS, 7, 397–416.
  • [16] Geyer, C. J. (1996), “On the Asymptotics of Convex Stochastic, Optimization”.
  • [17] Jörnsten, R. and Yu, B. (2003), “Simultaneous Gene Clustering and Subset Selection,”, Bioinformatics, 19, 1100–1109.
  • [18] Knight, K. and Fu, W. (2000), “Asymptotics for lasso-type estimators,”, Annals of Statistics, 28, 1356–1378.
  • [19] Mann, W. R. (1953), “Mean value methods in iteration,”, Proc. Amer. Math. Soc., 4, 506–510.
  • [20] Nolte, A. and Schrader, R. (2000), “A note on the finite time behavior of simulated annealing,”, Math. Operat. Res., 25, 476–484.
  • [21] Opial, Z. (1967), “Weak convergence of the sequence of successive approximations for nonexpansive mappings,”, Bull. Amer. Math. Soc., 73, 591–597.
  • [22] Osborne, M., Presnell, B., and Turlach, B. (2000), “On the LASSO and its Dual,”, J. Comput. Graph. Statist., 9, 319–337.
  • [23] She, Y. (2008), “Sparse Regression with Exact Clustering,” Ph.D. thesis, Stanford, University.
  • [24] She, Y. (2009), “Thresholding-based Iterative Selection Procedures for Model Selection and Shrinkage,”, Electronic Journal of Statistics, 3, 384–415.
  • [25] Shimizu, K., Ishizuka, Y., and Bard, J. (1997), Nondifferentiable and Two-Level Mathematical Programming, Kluwer Academic Publishers.
  • [26] Tibshirani, R. (1996), “Regression Shrinkage and Selection via the Lasso,”, JRSSB, 58, 267–288.
  • [27] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., and Knight, K. (2005), “Sparsity and smoothness via the fused lasso,”, JRSSB, 67, 91–108.
  • [28] Wrinkler, G. (1990), “An Ergodic, L2-Theorem for Simulated Annealing in Bayesian Image Reconstruction,” Journal of Applied Probability, 28, 779–791.
  • [29] Wu, T. and Lange, K. (2008), “Coordinate Descent Algorithm for Lasso Penalized Regression,”, Ann. Appl. Stat., 2, 224–244.
  • [30] Yuan, M. and Lin, Y. (2006), “Model selection and estimation in regression with grouped variables,”, JRSSB, 68, 49–67.
  • [31] Zhang, C.-H. and Huang, J. (2008), “The sparsity and bias of the Lasso selection in high-dimensional linear regression,”, Ann. Statist, 36, 1567–1594.
  • [32] Zhao, P. and Yu, B. (2006a), “Grouped and Hierarchical Model Selection through Composite Absolute Penalties,” Tech. rep., Dept. of Statistics, University of California, Berkeley.
  • [33] Zhao, P. and Yu, B. (2006b), “On Model Selection Consistency of Lasso,”, Journal of Machine Learning Research, 7, 2541–2563.
  • [34] Zou, H. (2006), “The Adaptive Lasso and Its Oracle Properties,”, JASA, 101, 1418–1429.
  • [35] Zou, H. and Hastie, T. (2005), “Regularization and Variable Selection via the Elastic Net,”, JRSSB, 67, 301–320.
  • [36] Zou, H. and Li, R. (2008), “One-step Sparse Estimates in Nonconcave Penalized Likelihood Models,”, Annals of Statistics, 1509–1533.