The Annals of Statistics

Global solutions to folded concave penalized nonconvex learning

Hongcheng Liu, Tao Yao, and Runze Li

Full-text: Open access

Abstract

This paper is concerned with solving nonconvex learning problems with folded concave penalty. Despite that their global solutions entail desirable statistical properties, they lack optimization techniques that guarantee global optimality in a general setting. In this paper, we show that a class of nonconvex learning problems are equivalent to general quadratic programs. This equivalence facilitates us in developing mixed integer linear programming reformulations, which admit finite algorithms that find a provably global optimal solution. We refer to this reformulation-based technique as the mixed integer programming-based global optimization (MIPGO). To our knowledge, this is the first global optimization scheme with a theoretical guarantee for folded concave penalized nonconvex learning with the SCAD penalty [J. Amer. Statist. Assoc. 96 (2001) 1348–1360] and the MCP penalty [Ann. Statist. 38 (2001) 894–942]. Numerical results indicate a significant outperformance of MIPGO over the state-of-the-art solution scheme, local linear approximation and other alternative solution techniques in literature in terms of solution quality.

Article information

Source
Ann. Statist., Volume 44, Number 2 (2016), 629-659.

Dates
Received: July 2014
Revised: June 2015
First available in Project Euclid: 17 March 2016

Permanent link to this document
https://projecteuclid.org/euclid.aos/1458245730

Digital Object Identifier
doi:10.1214/15-AOS1380

Mathematical Reviews number (MathSciNet)
MR3476612

Zentralblatt MATH identifier
1337.62163

Subjects
Primary: 62J05: Linear regression
Secondary: 62J07: Ridge regression; shrinkage estimators

Keywords
Folded concave penalties global optimization high-dimensional statistical learning MCP nonconvex quadratic programming SCAD sparse recovery

Citation

Liu, Hongcheng; Yao, Tao; Li, Runze. Global solutions to folded concave penalized nonconvex learning. Ann. Statist. 44 (2016), no. 2, 629--659. doi:10.1214/15-AOS1380. https://projecteuclid.org/euclid.aos/1458245730


Export citation

References

  • Bertsimas, D., Chang, A. and Rudin, C. (2011). Integer optimization methods for supervised ranking. Available at http://hdl.handle.net/1721.1/67362.
  • Breheny, P. and Huang, J. (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5 232–253.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • Fan, J. and Lv, J. (2011). Nonconcave penalized likelihood with NP-dimensionality. IEEE Trans. Inform. Theory 57 5467–5484.
  • Fan, J., Xue, L. and Zou, H. (2014). Strong oracle optimality of folded concave penalized estimation. Ann. Statist. 42 819–849.
  • Grant, M. C. and Boyd, S. P. (2008). Graph implementations for nonsmooth convex programs. In Recent Advances in Learning and Control. Lecture Notes in Control and Inform. Sci. 371 95–110. Springer, London.
  • Grant, M. and Boyd, S. (2013). CVX: Matlab software for disciplined convex programming, version 2.0 beta. Available at http://cvxr.com/cvx.
  • Huang, J. and Zhang, C.-H. (2012). Estimation and selection via absolute penalized convex minimization and its multistage adaptive applications. J. Mach. Learn. Res. 13 1839–1864.
  • Hunter, D. R. and Li, R. (2005). Variable selection using MM algorithms. Ann. Statist. 33 1617–1642.
  • Kim, Y., Choi, H. and Oh, H.-S. (2008). Smoothly clipped absolute deviation on high dimensions. J. Amer. Statist. Assoc. 103 1665–1673.
  • Lan, W., Zhong, P.-S., Li, R., Wang, H. and Tsai, C.-L. (2013). Testing a single regression coefficient in high dimensional linear models. Working paper.
  • Lawler, E. L. and Wood, D. E. (1966). Branch-and-bound methods: A survey. Oper. Res. 14 699–719.
  • Li, R., Zhong, W. and Zhu, L. (2012). Feature screening via distance correlation learning. J. Amer. Statist. Assoc. 107 1129–1139.
  • Liu, H., Yao, T. and Li Runze (2016). Supplement to “Global solutions to folded concave penalized nonconvex learning.” DOI:10.1214/15-AOS1380SUPP.
  • Loh, P.-L. and Wainwright, M. J. (2015). Regularized $M$-estimators with nonconvexity: Statistical and algorithmic theory for local optima. J. Mach. Learn. Res. 16 559–616.
  • Martí, R. and Reinelt, G. (2011). Branch-and-bound. In The Linear Ordering Problem. Springer, Heidelberg.
  • Mazumder, R., Friedman, J. and Hastie, T. (2011). SparseNet: Coordinate descent with non-convex penalties. J. Amer. Statist. Assoc. 106 1125–1138.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • Nesterov, Y. (2007). Gradient methods for minimizing composite objective function. CORE Discussion Papers 2007076, Universit Catholique de Louvain, Center for Operations Research and Dconometrics (CORE).
  • Pardalos, P. M. (1991). Global optimization algorithms for linearly constrained indefinite quadratic problems. Comput. Math. Appl. 21 87–97.
  • Vandenbussche, D. and Nemhauser, G. L. (2005). A polyhedral study of nonconvex quadratic programs with box constraints. Math. Program. 102 531–557.
  • Vavasis, S. A. (1991). Nonlinear Optimization. International Series of Monographs on Computer Science 8. The Clarendon Press, Oxford Univ. Press, New York.
  • Vavasis, S. A. (1992). Approximation algorithms for indefinite quadratic programming. Math. Program. 57 279–311.
  • Wang, H. (2009). Forward regression for ultra-high dimensional variable screening. J. Amer. Statist. Assoc. 104 1512–1524.
  • Wang, L., Kim, Y. and Li, R. (2013). Calibrating nonconvex penalized regression in ultra-high dimension. Ann. Statist. 41 2505–2536.
  • Wang, Z., Liu, H. and Zhang, T. (2014). Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. Ann. Statist. 42 2164–2201.
  • Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
  • Zhang, C.-H. and Zhang, T. (2012). A general theory of concave regularization for high-dimensional sparse estimation problems. Statist. Sci. 27 576–593.
  • Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
  • Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist. 36 1509–1533.

Supplemental materials

  • Supplement to “Global solutions to folded concave penalized nonconvex learning”. This supplemental material includes the proofs of Proposition 2.1, 2.3 and Lemma 4.1, and some additional numerical results.