We study the problem of estimating high-dimensional regression
models regularized by a structured sparsity-inducing penalty
that encodes prior structural information on either the input or
output variables. We consider two widely adopted types of
penalties of this kind as motivating examples: (1) the general
overlapping-group-lasso penalty, generalized from the
group-lasso penalty; and (2) the graph-guided-fused-lasso
penalty, generalized from the fused-lasso penalty. For both
types of penalties, due to their nonseparability and
nonsmoothness, developing an efficient optimization method
remains a challenging problem. In this paper we propose a
general optimization approach, the smoothing proximal
gradient (SPG) method, which can solve structured
sparse regression problems with any smooth convex loss under a
wide spectrum of structured sparsity-inducing penalties. Our
approach combines a smoothing technique with an effective
proximal gradient method. It achieves a convergence rate
significantly faster than the standard first-order methods,
subgradient methods, and is much more scalable than the most
widely used interior-point methods. The efficiency and
scalability of our method are demonstrated on both simulation
experiments and real genetic data sets.
References
Abate, N., Chandalia, M., Satija, P. and Adams-Huet, B. et al. (2005). Enpp1/pc-1 k121q polymorphism and genetic susceptibility to type 2 diabetes. Diabetes 54 1027–1213.
Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 183–202.
Bertsekas, D. (1999). Nonlinear Programming. Athena Scientific, Nashua, NH.
Duchi, J. and Singer, Y. (2009). Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res. 10 2899–2934.
Friedman, J., Hastie, T. and Tibshirani, R. (2010). A note on the group lasso and a sparse group lasso. Dept. Statistics, Stanford Univ.
Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Stat. 1 302–332.
Huang, D. W., Sherman, B. T. and Lempicki, R. A. (2009). Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nature Protoc. 4 44–57.
Jacob, L., Obozinski, G. and Vert, J. P. (2009). Group lasso with overlap and graph lasso. In Proceedings of the International Conference on Machine Learning. ACM, Montreal, QC.
Jenatton, R., Audibert, J. Y. and Bach, F. (2009). Structured variable selection with sparsity-inducing norms. Technical report, INRIA.
Jenatton, R., Mairal, J., Obozinski, G. and Bach, F. (2010). Proximal methods for sparse hierarchical dictionary learning. In Proceedings of the International Conference on Machine Learning. Omnipress, Haifa.
Kim, S., Sohn, K. A. and Xing, E. P. (2009). A multivariate regression approach to association analysis of a quantitative trait network. Bioinformatics 25 204–212.
Kim, S. and Xing, E. P. (2009). Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genet. 5 e1000587.
Kim, S. and Xing, E. P. (2010). Tree-guided group lasso for multi-task regression with structured sparsity. In Proceedings of the International Conference on Machine Learning. Omnipress, Haifa.
Lan, G., Lu, Z. and Monteiro, R. (2011). Primal-dual first-order methods with ${O}(1/\varepsilon)$ iteration complexity for cone programming. Mathematical Programming 126 1–29.
Lange, K. (2004). Optimization. Springer, Berlin.
Liu, J., Ji, S. and Ye, J. (2009). Multi-task feature learning via efficient $\ell_{2,1}$-norm minimization. In Proceedings of the Uncertainty in AI. AUAI Press, Montreal, QC.
Liu, J. and Ye, J. (2010a). Fast overlapping group lasso. Available at
arXiv:1009.0306v1.
Liu, J. and Ye, J. (2010b). Moreau-yosida regularization for grouped tree structure learning. In Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc., Vancouver, BC.
Liu, J., Yuan, L. and Ye, J. (2010). An efficient algorithm for a class of fused lasso problems. In The ACM SIG Knowledge Discovery and Data Mining. ACM, Washington, DC.
Ma, S. and Kosorok, M. R. (2010). Detection of gene pathways with predictive power for breast cancer prognosis. BMC Bioinformatics 11 1.
Mairal, J., Jenatton, R., Obozinski, G. and Bach, F. (2010). Network flow algorithms for structured sparsity. In Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc., Vancouver, BC.
Nesterov, Y. (2003). Excessive gap technique in non-smooth convex minimization. Technical report, Univ. Catholique de Louvain, Center for Operations Research and Econometrics (CORE).
Nesterov, Y. (2005). Smooth minimization of non-smooth functions. Mathematical Programming 103 127–152.
Nesterov, Y. (2007). Gradient methods for minimizing composite objective function. ECORE Discussion Paper 2007.
Obozinski, G., Taskar, B. and Jordan, M. I. (2009). High-dimensional union support recovery in multivariate regression. In Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc., Vancouver, BC.
Qi, L. and Sun, J. (1993). A nonsmooth version of newton’s method. Mathematical Programming 58 353–367.
Rockafellar, R. (1996). Convex Analysis. Princeton Univ. Press, Princeton.
Subramanian, A., Tamayo, P. and Mootha, V. et al. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102 15545–15550.
Sun, D., Womersley, R. and Qi, H. (2002). A feasible semismooth asymptotically Newton method for mixed complementarity problems. Mathematical Programming, Ser. A 94 167–187.
The International HapMap Consortium. (2005). A haplotype map of the human genome. Nature 437 1399–1320.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58 267–288.
Tibshirani, R. and Saunders, M. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 91–108.
Tibshirani, R. and Taylor, J. (2010). The solution path of the generalized lasso. Ann. Statist. 39 1335–1371.
Tseng, P. (2008). On accelerated proximal gradient methods for convex-concave optimization. SIAM J. Optim. To appear.
Tütüncü, R. H., Toh, K. C. and Todd, M. J. (2003). Solving semidefinite-quadratic-linear programs using sdpt3. Mathematical Programming 95 189–217.
van de Vijver, M. J. et al. (2002). A gene-expression signature as a predictor of survival in breast cancer. New England Journal of Medicine 347 1999–2009.
Wu, T. and Lange, K. (2008). Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2 224–244.
Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
Zhang, Z., Lange, K., Ophoff, R. and Sabatti, C. (2010). Reconstructing DNA copy number by penalized estimation and imputation. Ann. Appl. Stat. 4 1749–1773.
Zhao, P., Rocha, G. and Yu, B. (2009a). The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Statist. 37 3468–3497.
Zhao, P., Rocha, G. and Yu, B. (2009b). Grouped and hierarchical model selection through composite absolute penalties. Ann. Statist. 37 3468–3497.
Zhou, H., Alexander, D. and Lange, K. (2011). A quasi-Newton acceleration for high-dimensional optimization algorithms. Stat. Comput. 21 261–273.
Zhou, H. and Lange, K. (2011). A path algorithm for constrained estimation. Available at
arXiv:1103.3738v1.