### Smoothing proximal gradient method for general structured sparse regression

Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, and Eric P. Xing
Source: Ann. Appl. Stat. Volume 6, Number 2 (2012), 719-752.

#### Abstract

We study the problem of estimating high-dimensional regression models regularized by a structured sparsity-inducing penalty that encodes prior structural information on either the input or output variables. We consider two widely adopted types of penalties of this kind as motivating examples: (1) the general overlapping-group-lasso penalty, generalized from the group-lasso penalty; and (2) the graph-guided-fused-lasso penalty, generalized from the fused-lasso penalty. For both types of penalties, due to their nonseparability and nonsmoothness, developing an efficient optimization method remains a challenging problem. In this paper we propose a general optimization approach, the smoothing proximal gradient (SPG) method, which can solve structured sparse regression problems with any smooth convex loss under a wide spectrum of structured sparsity-inducing penalties. Our approach combines a smoothing technique with an effective proximal gradient method. It achieves a convergence rate significantly faster than the standard first-order methods, subgradient methods, and is much more scalable than the most widely used interior-point methods. The efficiency and scalability of our method are demonstrated on both simulation experiments and real genetic data sets.

First Page:
In 2007, access to the Annals of Applied Statistics was open. Beginning in 2008, you must hold a subscription or be a member of the IMS to view the full journal. For more information on subscribing, please visit: http://imstat.org/orders.
If you are already an IMS member, you may need to update your Euclid profile following the instructions here: http://imstat.org/publications/eaccess.htm.

Permanent link to this document: http://projecteuclid.org/euclid.aoas/1339419614
Digital Object Identifier: doi:10.1214/11-AOAS514
Zentralblatt MATH identifier: 06062737
Mathematical Reviews number (MathSciNet): MR2976489

### References

Abate, N., Chandalia, M., Satija, P. and Adams-Huet, B. et al. (2005). Enpp1/pc-1 k121q polymorphism and genetic susceptibility to type 2 diabetes. Diabetes 54 1027–1213.
Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 183–202.
Mathematical Reviews (MathSciNet): MR2486527
Zentralblatt MATH: 1175.94009
Digital Object Identifier: doi:10.1137/080716542
Bertsekas, D. (1999). Nonlinear Programming. Athena Scientific, Nashua, NH.
Duchi, J. and Singer, Y. (2009). Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res. 10 2899–2934.
Mathematical Reviews (MathSciNet): MR2579916
Zentralblatt MATH: 1235.62151
Friedman, J., Hastie, T. and Tibshirani, R. (2010). A note on the group lasso and a sparse group lasso. Dept. Statistics, Stanford Univ.
Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Stat. 1 302–332.
Mathematical Reviews (MathSciNet): MR2415737
Zentralblatt MATH: 05226935
Digital Object Identifier: doi:10.1214/07-AOAS131
Project Euclid: euclid.aoas/1196438020
Huang, D. W., Sherman, B. T. and Lempicki, R. A. (2009). Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nature Protoc. 4 44–57.
Jacob, L., Obozinski, G. and Vert, J. P. (2009). Group lasso with overlap and graph lasso. In Proceedings of the International Conference on Machine Learning. ACM, Montreal, QC.
Jenatton, R., Audibert, J. Y. and Bach, F. (2009). Structured variable selection with sparsity-inducing norms. Technical report, INRIA.
Jenatton, R., Mairal, J., Obozinski, G. and Bach, F. (2010). Proximal methods for sparse hierarchical dictionary learning. In Proceedings of the International Conference on Machine Learning. Omnipress, Haifa.
Kim, S., Sohn, K. A. and Xing, E. P. (2009). A multivariate regression approach to association analysis of a quantitative trait network. Bioinformatics 25 204–212.
Kim, S. and Xing, E. P. (2009). Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genet. 5 e1000587.
Kim, S. and Xing, E. P. (2010). Tree-guided group lasso for multi-task regression with structured sparsity. In Proceedings of the International Conference on Machine Learning. Omnipress, Haifa.
Lan, G., Lu, Z. and Monteiro, R. (2011). Primal-dual first-order methods with ${O}(1/\varepsilon)$ iteration complexity for cone programming. Mathematical Programming 126 1–29.
Mathematical Reviews (MathSciNet): MR2764338
Digital Object Identifier: doi:10.1007/s10107-008-0261-6
Lange, K. (2004). Optimization. Springer, Berlin.
Liu, J., Ji, S. and Ye, J. (2009). Multi-task feature learning via efficient $\ell_{2,1}$-norm minimization. In Proceedings of the Uncertainty in AI. AUAI Press, Montreal, QC.
Liu, J. and Ye, J. (2010a). Fast overlapping group lasso. Available at arXiv:1009.0306v1.
arXiv: 1009.0306v1
Liu, J. and Ye, J. (2010b). Moreau-yosida regularization for grouped tree structure learning. In Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc., Vancouver, BC.
Liu, J., Yuan, L. and Ye, J. (2010). An efficient algorithm for a class of fused lasso problems. In The ACM SIG Knowledge Discovery and Data Mining. ACM, Washington, DC.
Ma, S. and Kosorok, M. R. (2010). Detection of gene pathways with predictive power for breast cancer prognosis. BMC Bioinformatics 11 1.
Mairal, J., Jenatton, R., Obozinski, G. and Bach, F. (2010). Network flow algorithms for structured sparsity. In Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc., Vancouver, BC.
Nesterov, Y. (2003). Excessive gap technique in non-smooth convex minimization. Technical report, Univ. Catholique de Louvain, Center for Operations Research and Econometrics (CORE).
Nesterov, Y. (2005). Smooth minimization of non-smooth functions. Mathematical Programming 103 127–152.
Mathematical Reviews (MathSciNet): MR2166537
Digital Object Identifier: doi:10.1007/s10107-004-0552-5
Nesterov, Y. (2007). Gradient methods for minimizing composite objective function. ECORE Discussion Paper 2007.
Obozinski, G., Taskar, B. and Jordan, M. I. (2009). High-dimensional union support recovery in multivariate regression. In Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc., Vancouver, BC.
Qi, L. and Sun, J. (1993). A nonsmooth version of newton’s method. Mathematical Programming 58 353–367.
Mathematical Reviews (MathSciNet): MR1216791
Zentralblatt MATH: 0780.90090
Digital Object Identifier: doi:10.1007/BF01581275
Rockafellar, R. (1996). Convex Analysis. Princeton Univ. Press, Princeton.
Subramanian, A., Tamayo, P. and Mootha, V. et al. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102 15545–15550.
Sun, D., Womersley, R. and Qi, H. (2002). A feasible semismooth asymptotically Newton method for mixed complementarity problems. Mathematical Programming, Ser. A 94 167–187.
Mathematical Reviews (MathSciNet): MR1953170
Digital Object Identifier: doi:10.1007/s10107-002-0305-2
The International HapMap Consortium. (2005). A haplotype map of the human genome. Nature 437 1399–1320.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58 267–288.
Mathematical Reviews (MathSciNet): MR1379242
Tibshirani, R. and Saunders, M. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 91–108.
Mathematical Reviews (MathSciNet): MR2136641
Zentralblatt MATH: 1060.62049
Digital Object Identifier: doi:10.1111/j.1467-9868.2005.00490.x
Tibshirani, R. and Taylor, J. (2010). The solution path of the generalized lasso. Ann. Statist. 39 1335–1371.
Mathematical Reviews (MathSciNet): MR2850205
Zentralblatt MATH: 1234.62107
Digital Object Identifier: doi:10.1214/11-AOS878
Project Euclid: euclid.aos/1304514656
Tseng, P. (2008). On accelerated proximal gradient methods for convex-concave optimization. SIAM J. Optim. To appear.
Mathematical Reviews (MathSciNet): MR1479608
Zentralblatt MATH: 0914.90218
Digital Object Identifier: doi:10.1137/S1052623495279797
Tütüncü, R. H., Toh, K. C. and Todd, M. J. (2003). Solving semidefinite-quadratic-linear programs using sdpt3. Mathematical Programming 95 189–217.
Mathematical Reviews (MathSciNet): MR1976479
Digital Object Identifier: doi:10.1007/s10107-002-0347-5
van de Vijver, M. J. et al. (2002). A gene-expression signature as a predictor of survival in breast cancer. New England Journal of Medicine 347 1999–2009.
Wu, T. and Lange, K. (2008). Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2 224–244.
Mathematical Reviews (MathSciNet): MR2415601
Zentralblatt MATH: 1137.62045
Digital Object Identifier: doi:10.1214/07-AOAS147
Project Euclid: euclid.aoas/1206367819
Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
Mathematical Reviews (MathSciNet): MR2212574
Zentralblatt MATH: 1141.62030
Digital Object Identifier: doi:10.1111/j.1467-9868.2005.00532.x
Zhang, Z., Lange, K., Ophoff, R. and Sabatti, C. (2010). Reconstructing DNA copy number by penalized estimation and imputation. Ann. Appl. Stat. 4 1749–1773.
Mathematical Reviews (MathSciNet): MR2829935
Zentralblatt MATH: 1220.62146
Digital Object Identifier: doi:10.1214/10-AOAS357
Project Euclid: euclid.aoas/1294167797
Zhao, P., Rocha, G. and Yu, B. (2009a). The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Statist. 37 3468–3497.
Mathematical Reviews (MathSciNet): MR2549566
Zentralblatt MATH: 05644286
Digital Object Identifier: doi:10.1214/07-AOS584
Project Euclid: euclid.aos/1250515393
Zhao, P., Rocha, G. and Yu, B. (2009b). Grouped and hierarchical model selection through composite absolute penalties. Ann. Statist. 37 3468–3497.
Mathematical Reviews (MathSciNet): MR2549566
Zentralblatt MATH: 05644286
Digital Object Identifier: doi:10.1214/07-AOS584
Project Euclid: euclid.aos/1250515393
Zhou, H., Alexander, D. and Lange, K. (2011). A quasi-Newton acceleration for high-dimensional optimization algorithms. Stat. Comput. 21 261–273.
Mathematical Reviews (MathSciNet): MR2774856
Zentralblatt MATH: 06113574
Digital Object Identifier: doi:10.1007/s11222-009-9166-3
Zhou, H. and Lange, K. (2011). A path algorithm for constrained estimation. Available at arXiv:1103.3738v1.
arXiv: 1103.3738v1