The Annals of Applied Statistics

Smoothing proximal gradient method for general structured sparse regression

Abstract

We study the problem of estimating high-dimensional regression models regularized by a structured sparsity-inducing penalty that encodes prior structural information on either the input or output variables. We consider two widely adopted types of penalties of this kind as motivating examples: (1) the general overlapping-group-lasso penalty, generalized from the group-lasso penalty; and (2) the graph-guided-fused-lasso penalty, generalized from the fused-lasso penalty. For both types of penalties, due to their nonseparability and nonsmoothness, developing an efficient optimization method remains a challenging problem. In this paper we propose a general optimization approach, the smoothing proximal gradient (SPG) method, which can solve structured sparse regression problems with any smooth convex loss under a wide spectrum of structured sparsity-inducing penalties. Our approach combines a smoothing technique with an effective proximal gradient method. It achieves a convergence rate significantly faster than the standard first-order methods, subgradient methods, and is much more scalable than the most widely used interior-point methods. The efficiency and scalability of our method are demonstrated on both simulation experiments and real genetic data sets.

Article information

Source
Ann. Appl. Stat. Volume 6, Number 2 (2012), 719-752.

Dates
First available in Project Euclid: 11 June 2012

Permanent link to this document
http://projecteuclid.org/euclid.aoas/1339419614

Digital Object Identifier
doi:10.1214/11-AOAS514

Mathematical Reviews number (MathSciNet)
MR2976489

Zentralblatt MATH identifier
06062737

Citation

Chen, Xi; Lin, Qihang; Kim, Seyoung; Carbonell, Jaime G.; Xing, Eric P. Smoothing proximal gradient method for general structured sparse regression. Ann. Appl. Stat. 6 (2012), no. 2, 719--752. doi:10.1214/11-AOAS514. http://projecteuclid.org/euclid.aoas/1339419614.

References

• Abate, N., Chandalia, M., Satija, P. and Adams-Huet, B. et al. (2005). Enpp1/pc-1 k121q polymorphism and genetic susceptibility to type 2 diabetes. Diabetes 54 1027–1213.
• Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 183–202.
• Bertsekas, D. (1999). Nonlinear Programming. Athena Scientific, Nashua, NH.
• Duchi, J. and Singer, Y. (2009). Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res. 10 2899–2934.
• Friedman, J., Hastie, T. and Tibshirani, R. (2010). A note on the group lasso and a sparse group lasso. Dept. Statistics, Stanford Univ.
• Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Stat. 1 302–332.
• Huang, D. W., Sherman, B. T. and Lempicki, R. A. (2009). Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nature Protoc. 4 44–57.
• Jacob, L., Obozinski, G. and Vert, J. P. (2009). Group lasso with overlap and graph lasso. In Proceedings of the International Conference on Machine Learning. ACM, Montreal, QC.
• Jenatton, R., Audibert, J. Y. and Bach, F. (2009). Structured variable selection with sparsity-inducing norms. Technical report, INRIA.
• Jenatton, R., Mairal, J., Obozinski, G. and Bach, F. (2010). Proximal methods for sparse hierarchical dictionary learning. In Proceedings of the International Conference on Machine Learning. Omnipress, Haifa.
• Kim, S., Sohn, K. A. and Xing, E. P. (2009). A multivariate regression approach to association analysis of a quantitative trait network. Bioinformatics 25 204–212.
• Kim, S. and Xing, E. P. (2009). Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genet. 5 e1000587.
• Kim, S. and Xing, E. P. (2010). Tree-guided group lasso for multi-task regression with structured sparsity. In Proceedings of the International Conference on Machine Learning. Omnipress, Haifa.
• Lan, G., Lu, Z. and Monteiro, R. (2011). Primal-dual first-order methods with ${O}(1/\varepsilon)$ iteration complexity for cone programming. Mathematical Programming 126 1–29.
• Lange, K. (2004). Optimization. Springer, Berlin.
• Liu, J., Ji, S. and Ye, J. (2009). Multi-task feature learning via efficient $\ell_{2,1}$-norm minimization. In Proceedings of the Uncertainty in AI. AUAI Press, Montreal, QC.
• Liu, J. and Ye, J. (2010a). Fast overlapping group lasso. Available at arXiv:1009.0306v1.
• Liu, J. and Ye, J. (2010b). Moreau-yosida regularization for grouped tree structure learning. In Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc., Vancouver, BC.
• Liu, J., Yuan, L. and Ye, J. (2010). An efficient algorithm for a class of fused lasso problems. In The ACM SIG Knowledge Discovery and Data Mining. ACM, Washington, DC.
• Ma, S. and Kosorok, M. R. (2010). Detection of gene pathways with predictive power for breast cancer prognosis. BMC Bioinformatics 11 1.
• Mairal, J., Jenatton, R., Obozinski, G. and Bach, F. (2010). Network flow algorithms for structured sparsity. In Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc., Vancouver, BC.
• Nesterov, Y. (2003). Excessive gap technique in non-smooth convex minimization. Technical report, Univ. Catholique de Louvain, Center for Operations Research and Econometrics (CORE).
• Nesterov, Y. (2005). Smooth minimization of non-smooth functions. Mathematical Programming 103 127–152.
• Nesterov, Y. (2007). Gradient methods for minimizing composite objective function. ECORE Discussion Paper 2007.
• Obozinski, G., Taskar, B. and Jordan, M. I. (2009). High-dimensional union support recovery in multivariate regression. In Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc., Vancouver, BC.
• Qi, L. and Sun, J. (1993). A nonsmooth version of newton’s method. Mathematical Programming 58 353–367.
• Rockafellar, R. (1996). Convex Analysis. Princeton Univ. Press, Princeton.
• Subramanian, A., Tamayo, P. and Mootha, V. et al. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102 15545–15550.
• Sun, D., Womersley, R. and Qi, H. (2002). A feasible semismooth asymptotically Newton method for mixed complementarity problems. Mathematical Programming, Ser. A 94 167–187.
• The International HapMap Consortium. (2005). A haplotype map of the human genome. Nature 437 1399–1320.
• Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58 267–288.
• Tibshirani, R. and Saunders, M. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 91–108.
• Tibshirani, R. and Taylor, J. (2010). The solution path of the generalized lasso. Ann. Statist. 39 1335–1371.
• Tseng, P. (2008). On accelerated proximal gradient methods for convex-concave optimization. SIAM J. Optim. To appear.
• Tütüncü, R. H., Toh, K. C. and Todd, M. J. (2003). Solving semidefinite-quadratic-linear programs using sdpt3. Mathematical Programming 95 189–217.
• van de Vijver, M. J. et al. (2002). A gene-expression signature as a predictor of survival in breast cancer. New England Journal of Medicine 347 1999–2009.
• Wu, T. and Lange, K. (2008). Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2 224–244.
• Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
• Zhang, Z., Lange, K., Ophoff, R. and Sabatti, C. (2010). Reconstructing DNA copy number by penalized estimation and imputation. Ann. Appl. Stat. 4 1749–1773.
• Zhao, P., Rocha, G. and Yu, B. (2009a). The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Statist. 37 3468–3497.
• Zhao, P., Rocha, G. and Yu, B. (2009b). Grouped and hierarchical model selection through composite absolute penalties. Ann. Statist. 37 3468–3497.
• Zhou, H., Alexander, D. and Lange, K. (2011). A quasi-Newton acceleration for high-dimensional optimization algorithms. Stat. Comput. 21 261–273.
• Zhou, H. and Lange, K. (2011). A path algorithm for constrained estimation. Available at arXiv:1103.3738v1.