The Annals of Applied Statistics

Smoothing proximal gradient method for general structured sparse regression

Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, and Eric P. Xing

Full-text: Access denied (no subscription detected) We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


We study the problem of estimating high-dimensional regression models regularized by a structured sparsity-inducing penalty that encodes prior structural information on either the input or output variables. We consider two widely adopted types of penalties of this kind as motivating examples: (1) the general overlapping-group-lasso penalty, generalized from the group-lasso penalty; and (2) the graph-guided-fused-lasso penalty, generalized from the fused-lasso penalty. For both types of penalties, due to their nonseparability and nonsmoothness, developing an efficient optimization method remains a challenging problem. In this paper we propose a general optimization approach, the smoothing proximal gradient (SPG) method, which can solve structured sparse regression problems with any smooth convex loss under a wide spectrum of structured sparsity-inducing penalties. Our approach combines a smoothing technique with an effective proximal gradient method. It achieves a convergence rate significantly faster than the standard first-order methods, subgradient methods, and is much more scalable than the most widely used interior-point methods. The efficiency and scalability of our method are demonstrated on both simulation experiments and real genetic data sets.

Article information

Ann. Appl. Stat. Volume 6, Number 2 (2012), 719-752.

First available in Project Euclid: 11 June 2012

Permanent link to this document

Digital Object Identifier

Zentralblatt MATH identifier

Mathematical Reviews number (MathSciNet)


Chen, Xi; Lin, Qihang; Kim, Seyoung; Carbonell, Jaime G.; Xing, Eric P. Smoothing proximal gradient method for general structured sparse regression. Ann. Appl. Stat. 6 (2012), no. 2, 719--752. doi:10.1214/11-AOAS514.

Export citation


  • Abate, N., Chandalia, M., Satija, P. and Adams-Huet, B. et al. (2005). Enpp1/pc-1 k121q polymorphism and genetic susceptibility to type 2 diabetes. Diabetes 54 1027–1213.
  • Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 183–202.
  • Bertsekas, D. (1999). Nonlinear Programming. Athena Scientific, Nashua, NH.
  • Duchi, J. and Singer, Y. (2009). Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res. 10 2899–2934.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2010). A note on the group lasso and a sparse group lasso. Dept. Statistics, Stanford Univ.
  • Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Stat. 1 302–332.
  • Huang, D. W., Sherman, B. T. and Lempicki, R. A. (2009). Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nature Protoc. 4 44–57.
  • Jacob, L., Obozinski, G. and Vert, J. P. (2009). Group lasso with overlap and graph lasso. In Proceedings of the International Conference on Machine Learning. ACM, Montreal, QC.
  • Jenatton, R., Audibert, J. Y. and Bach, F. (2009). Structured variable selection with sparsity-inducing norms. Technical report, INRIA.
  • Jenatton, R., Mairal, J., Obozinski, G. and Bach, F. (2010). Proximal methods for sparse hierarchical dictionary learning. In Proceedings of the International Conference on Machine Learning. Omnipress, Haifa.
  • Kim, S., Sohn, K. A. and Xing, E. P. (2009). A multivariate regression approach to association analysis of a quantitative trait network. Bioinformatics 25 204–212.
  • Kim, S. and Xing, E. P. (2009). Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genet. 5 e1000587.
  • Kim, S. and Xing, E. P. (2010). Tree-guided group lasso for multi-task regression with structured sparsity. In Proceedings of the International Conference on Machine Learning. Omnipress, Haifa.
  • Lan, G., Lu, Z. and Monteiro, R. (2011). Primal-dual first-order methods with ${O}(1/\varepsilon)$ iteration complexity for cone programming. Mathematical Programming 126 1–29.
  • Lange, K. (2004). Optimization. Springer, Berlin.
  • Liu, J., Ji, S. and Ye, J. (2009). Multi-task feature learning via efficient $\ell_{2,1}$-norm minimization. In Proceedings of the Uncertainty in AI. AUAI Press, Montreal, QC.
  • Liu, J. and Ye, J. (2010a). Fast overlapping group lasso. Available at arXiv:1009.0306v1.
  • Liu, J. and Ye, J. (2010b). Moreau-yosida regularization for grouped tree structure learning. In Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc., Vancouver, BC.
  • Liu, J., Yuan, L. and Ye, J. (2010). An efficient algorithm for a class of fused lasso problems. In The ACM SIG Knowledge Discovery and Data Mining. ACM, Washington, DC.
  • Ma, S. and Kosorok, M. R. (2010). Detection of gene pathways with predictive power for breast cancer prognosis. BMC Bioinformatics 11 1.
  • Mairal, J., Jenatton, R., Obozinski, G. and Bach, F. (2010). Network flow algorithms for structured sparsity. In Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc., Vancouver, BC.
  • Nesterov, Y. (2003). Excessive gap technique in non-smooth convex minimization. Technical report, Univ. Catholique de Louvain, Center for Operations Research and Econometrics (CORE).
  • Nesterov, Y. (2005). Smooth minimization of non-smooth functions. Mathematical Programming 103 127–152.
  • Nesterov, Y. (2007). Gradient methods for minimizing composite objective function. ECORE Discussion Paper 2007.
  • Obozinski, G., Taskar, B. and Jordan, M. I. (2009). High-dimensional union support recovery in multivariate regression. In Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc., Vancouver, BC.
  • Qi, L. and Sun, J. (1993). A nonsmooth version of newton’s method. Mathematical Programming 58 353–367.
  • Rockafellar, R. (1996). Convex Analysis. Princeton Univ. Press, Princeton.
  • Subramanian, A., Tamayo, P. and Mootha, V. et al. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102 15545–15550.
  • Sun, D., Womersley, R. and Qi, H. (2002). A feasible semismooth asymptotically Newton method for mixed complementarity problems. Mathematical Programming, Ser. A 94 167–187.
  • The International HapMap Consortium. (2005). A haplotype map of the human genome. Nature 437 1399–1320.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58 267–288.
  • Tibshirani, R. and Saunders, M. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 91–108.
  • Tibshirani, R. and Taylor, J. (2010). The solution path of the generalized lasso. Ann. Statist. 39 1335–1371.
  • Tseng, P. (2008). On accelerated proximal gradient methods for convex-concave optimization. SIAM J. Optim. To appear.
  • Tütüncü, R. H., Toh, K. C. and Todd, M. J. (2003). Solving semidefinite-quadratic-linear programs using sdpt3. Mathematical Programming 95 189–217.
  • van de Vijver, M. J. et al. (2002). A gene-expression signature as a predictor of survival in breast cancer. New England Journal of Medicine 347 1999–2009.
  • Wu, T. and Lange, K. (2008). Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2 224–244.
  • Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
  • Zhang, Z., Lange, K., Ophoff, R. and Sabatti, C. (2010). Reconstructing DNA copy number by penalized estimation and imputation. Ann. Appl. Stat. 4 1749–1773.
  • Zhao, P., Rocha, G. and Yu, B. (2009a). The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Statist. 37 3468–3497.
  • Zhao, P., Rocha, G. and Yu, B. (2009b). Grouped and hierarchical model selection through composite absolute penalties. Ann. Statist. 37 3468–3497.
  • Zhou, H., Alexander, D. and Lange, K. (2011). A quasi-Newton acceleration for high-dimensional optimization algorithms. Stat. Comput. 21 261–273.
  • Zhou, H. and Lange, K. (2011). A path algorithm for constrained estimation. Available at arXiv:1103.3738v1.