The Annals of Statistics

High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity

Po-Ling Loh and Martin J. Wainwright

Full-text: Open access


Although the standard formulations of prediction problems involve fully-observed and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependence, as well. We study these issues in the context of high-dimensional sparse linear regression, and propose novel estimators for the cases of noisy, missing and/or dependent data. Many standard approaches to noisy or missing data, such as those using the EM algorithm, lead to optimization problems that are inherently nonconvex, and it is difficult to establish theoretical guarantees on practical algorithms. While our approach also involves optimizing nonconvex programs, we are able to both analyze the statistical error associated with any global optimum, and more surprisingly, to prove that a simple algorithm based on projected gradient descent will converge in polynomial time to a small neighborhood of the set of all global minimizers. On the statistical side, we provide nonasymptotic bounds that hold with high probability for the cases of noisy, missing and/or dependent data. On the computational side, we prove that under the same types of conditions required for statistical consistency, the projected gradient descent algorithm is guaranteed to converge at a geometric rate to a near-global minimizer. We illustrate these theoretical predictions with simulations, showing close agreement with the predicted scalings.

Article information

Ann. Statist., Volume 40, Number 3 (2012), 1637-1664.

First available in Project Euclid: 5 September 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62F12: Asymptotic properties of estimators
Secondary: 68W25: Approximation algorithms

High-dimensional statistics missing data nonconvexity regularization sparse linear regression $M$-estimation


Loh, Po-Ling; Wainwright, Martin J. High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity. Ann. Statist. 40 (2012), no. 3, 1637--1664. doi:10.1214/12-AOS1018.

Export citation


  • [1] Agarwal, A., Negahban, S. and Wainwright, M. J. (2012). Fast global convergence of gradient methods for high-dimensional statistical recovery. Available at
  • [2] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • [3] Carroll, R. J., Ruppert, D. and Stefanski, L. A. (1995). Measurement Error in Nonlinear Models. Monographs on Statistics and Applied Probability 63. Chapman & Hall, London.
  • [4] Chen, S. S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33–61.
  • [5] Duchi, J., Shalev-Shwartz, S., Singer, Y. and Chandra, T. (2008). Efficient projections onto the $\ell_1$-ball for learning in high dimensions. In International Conference on Machine Learning 272–279. ACM, New York, NY.
  • [6] Hwang, J. T. (1986). Multiplicative errors-in-variables models with applications to recent data released by the U.S. Department of Energy. J. Amer. Statist. Assoc. 81 680–688.
  • [7] Iturria, S. J., Carroll, R. J. and Firth, D. (1999). Polynomial regression and estimating functions in the presence of multiplicative measurement error. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 547–561.
  • [8] Little, R. J. A. and Rubin, D. B. (1987). Statistical Analysis with Missing Data. Wiley, New York.
  • [9] Loh, P. and Wainwright, M. J. (2012). Supplement to “High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity.” DOI:10.1214/12-AOS1018SUPP.
  • [10] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
  • [11] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246–270.
  • [12] Negahban, S., Ravikumar, P., Wainwright, M. J. and Yu, B. (2009). A unified framework for the analysis of regularized $M$-estimators. In Advances in Neural Information Processing Systems. Curran Associates, Red Hook, NY.
  • [13] Raskutti, G., Wainwright, M. J. and Yu, B. (2010). Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11 2241–2259.
  • [14] Rosenbaum, M. and Tsybakov, A. B. (2010). Sparse recovery under matrix uncertainty. Ann. Statist. 38 2620–2651.
  • [15] Rosenbaum, M. and Tsybakov, A. B. (2011). Improved matrix uncertainty selector. Technical report. Available at
  • [16] Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494–515.
  • [17] Rudelson, M. and Zhou, S. (2011). Reconstruction from anisotropic random measurements. Technical report, Univ. Michigan.
  • [18] Städler, N. and Bühlmann, P. (2012). Missing values: Sparse inverse covariance estimation and an extension to sparse regression. Statist. Comput. 22 219–235.
  • [19] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267–288.
  • [20] van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360–1392.
  • [21] Xu, Q. and You, J. (2007). Covariate selection for linear errors-in-variables regression models. Comm. Statist. Theory Methods 36 375–386.
  • [22] Yuan, M. (2010). High dimensional inverse covariance matrix estimation via linear programming. J. Mach. Learn. Res. 11 2261–2286.
  • [23] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.

Supplemental materials

  • Supplementary material: Supplementary material for: High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity. Due to space constraints, we have relegated technical details of the remaining proofs to the supplement [9].