## The Annals of Statistics

### I-LAMM for sparse learning: Simultaneous control of algorithmic complexity and statistical error

#### Abstract

We propose a computational framework named iterative local adaptive majorize-minimization (I-LAMM) to simultaneously control algorithmic complexity and statistical error when fitting high-dimensional models. I-LAMM is a two-stage algorithmic implementation of the local linear approximation to a family of folded concave penalized quasi-likelihood. The first stage solves a convex program with a crude precision tolerance to obtain a coarse initial estimator, which is further refined in the second stage by iteratively solving a sequence of convex programs with smaller precision tolerances. Theoretically, we establish a phase transition: the first stage has a sublinear iteration complexity, while the second stage achieves an improved linear rate of convergence. Though this framework is completely algorithmic, it provides solutions with optimal statistical performances and controlled algorithmic complexity for a large family of nonconvex optimization problems. The iteration effects on statistical errors are clearly demonstrated via a contraction property. Our theory relies on a localized version of the sparse/restricted eigenvalue condition, which allows us to analyze a large family of loss and penalty functions and provide optimality guarantees under very weak assumptions (e.g., I-LAMM requires much weaker minimal signal strength than other procedures). Thorough numerical results are provided to support the obtained theory.

#### Article information

Source
Ann. Statist., Volume 46, Number 2 (2018), 814-841.

Dates
Revised: March 2017
First available in Project Euclid: 3 April 2018

https://projecteuclid.org/euclid.aos/1522742437

Digital Object Identifier
doi:10.1214/17-AOS1568

Mathematical Reviews number (MathSciNet)
MR3782385

Zentralblatt MATH identifier
06870280

Subjects
Primary: 62J07: Ridge regression; shrinkage estimators
Secondary: 62C20: Minimax procedures 62H35: Image analysis

#### Citation

Fan, Jianqing; Liu, Han; Sun, Qiang; Zhang, Tong. I-LAMM for sparse learning: Simultaneous control of algorithmic complexity and statistical error. Ann. Statist. 46 (2018), no. 2, 814--841. doi:10.1214/17-AOS1568. https://projecteuclid.org/euclid.aos/1522742437

#### References

• Agarwal, A., Negahban, S. and Wainwright, M. J. (2012). Fast global convergence of gradient methods for high-dimensional statistical recovery. Ann. Statist. 40 2452–2482.
• Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 183–202.
• Belloni, A. and Chernozhukov, V. (2013). Least squares after model selection in high-dimensional sparse models. Bernoulli 19 521–547.
• Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
• Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge Univ. Press, Cambridge.
• Breheny, P. and Huang, J. (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5 232–253.
• Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data. Springer, Heidelberg.
• Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194.
• Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
• Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 849–911.
• Fan, J. and Lv, J. (2011). Nonconcave penalized likelihood with NP-dimensionality. IEEE Trans. Inform. Theory 57 5467–5484.
• Fan, J., Xue, L. and Zou, H. (2014). Strong oracle optimality of folded concave penalized estimation. Ann. Statist. 42 819–849.
• Fan, J., Liu, H., Sun, Q. and Zhang, T. (2018). Supplement to “I-LAMM for sparse learning: Simultaneous control of algorithmic complexity and statistical error.” DOI:10.1214/17-AOS1568SUPP.
• Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Stat. 1 302–332.
• Hunter, D. R. and Lange, K. (2004). A tutorial on MM algorithms. Amer. Statist. 58 30–37.
• Kim, Y., Choi, H. and Oh, H.-S. (2008). Smoothly clipped absolute deviation on high dimensions. J. Amer. Statist. Assoc. 103 1665–1673.
• Kim, Y. and Kwon, S. (2012). Global optimality of nonconvex penalized estimators. Biometrika 99 315–325.
• Lange, K., Hunter, D. R. and Yang, I. (2000). Optimization transfer using surrogate objective functions. J. Comput. Graph. Statist. 9 1–59.
• Loh, P.-L. (2017). Statistical consistency and asymptotic normality for high-dimensional robust $M$-estimators. Ann. Statist. 45 866–896.
• Loh, P.-L. and Wainwright, M. J. (2014). Support recovery without incoherence: A case for nonconvex regularization. To appear. Available at arXiv:1412.5632.
• Loh, P.-L. and Wainwright, M. J. (2015). Regularized $M$-estimators with nonconvexity: Statistical and algorithmic theory for local optima. J. Mach. Learn. Res. 16 559–616.
• Lozano, A. C. and Meinshausen, N. (2013). Minimum distance estimation for robust high-dimensional regression. Available at arXiv:1307.3227.
• Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers. Statist. Sci. 27 538–557.
• Nesterov, Y. (2013). Gradient methods for minimizing composite functions. Math. Program. 140 125–161.
• Raskutti, G., Wainwright, M. J. and Yu, B. (2010). Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11 2241–2259.
• Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
• van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360–1392.
• Wang, L., Kim, Y. and Li, R. (2013). Calibrating nonconvex penalized regression in ultra-high dimension. Ann. Statist. 41 2505–2536.
• Wang, Z., Liu, H. and Zhang, T. (2014). Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. Ann. Statist. 42 2164–2201.
• Zhang, T. (2009). Some sharp performance bounds for least squares regression with $L_{1}$ regularization. Ann. Statist. 37 2109–2144.
• Zhang, C.-H. (2010a). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
• Zhang, T. (2010b). Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11 1081–1107.
• Zhang, C.-H. and Zhang, T. (2012). A general theory of concave regularization for high-dimensional sparse estimation problems. Statist. Sci. 27 576–593.
• Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
• Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist. 36 1509–1533.

#### Supplemental materials

• Supplement to “I-LAMM for Sparse learning: simultaneous control of algorithmic complexity and statistical error”. The Supplementary Material [Fan et al. (2018)] contains proofs for Corollary 4.3, Theorem 4.4, Proposition 4.5, Proposition 4.6 and Theorem 4.7 in Section 4. It collects proofs of the lemmas presented in Section 5. An application to robust linear regression is given in Appendix D. Other technical lemmas are collected in Appendices E and F.