The Annals of Statistics

Statistical consistency and asymptotic normality for high-dimensional robust $M$-estimators

Po-Ling Loh

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

We study theoretical properties of regularized robust $M$-estimators, applicable when data are drawn from a sparse high-dimensional linear model and contaminated by heavy-tailed distributions and/or outliers in the additive errors and covariates. We first establish a form of local statistical consistency for the penalized regression estimators under fairly mild conditions on the error distribution: When the derivative of the loss function is bounded and satisfies a local restricted curvature condition, all stationary points within a constant radius of the true regression vector converge at the minimax rate enjoyed by the Lasso with sub-Gaussian errors. When an appropriate nonconvex regularizer is used in place of an $\ell_{1}$-penalty, we show that such stationary points are in fact unique and equal to the local oracle solution with the correct support; hence, results on asymptotic normality in the low-dimensional case carry over immediately to the high-dimensional setting. This has important implications for the efficiency of regularized nonconvex $M$-estimators when the errors are heavy-tailed. Our analysis of the local curvature of the loss function also has useful consequences for optimization when the robust regression function and/or regularizer is nonconvex and the objective function possesses stationary points outside the local region. We show that as long as a composite gradient descent algorithm is initialized within a constant radius of the true regression vector, successive iterates will converge at a linear rate to a stationary point within the local region. Furthermore, the global optimum of a convex regularized robust regression function may be used to obtain a suitable initialization. The result is a novel two-step procedure that uses a convex $M$-estimator to achieve consistency and a nonconvex $M$-estimator to increase efficiency. We conclude with simulation results that corroborate our theoretical findings.

Article information

Source
Ann. Statist. Volume 45, Number 2 (2017), 866-896.

Dates
Received: January 2015
Revised: April 2016
First available in Project Euclid: 16 May 2017

Permanent link to this document
https://projecteuclid.org/euclid.aos/1494921960

Digital Object Identifier
doi:10.1214/16-AOS1471

Zentralblatt MATH identifier
1371.62023

Subjects
Primary: 62F12: Asymptotic properties of estimators

Keywords
Robust regression high-dimensional statistics statistical consistency support recovery nonconvex optimization

Citation

Loh, Po-Ling. Statistical consistency and asymptotic normality for high-dimensional robust $M$-estimators. Ann. Statist. 45 (2017), no. 2, 866--896. doi:10.1214/16-AOS1471. https://projecteuclid.org/euclid.aos/1494921960


Export citation

References

  • Agarwal, A., Negahban, S. and Wainwright, M. J. (2012). Fast global convergence of gradient methods for high-dimensional statistical recovery. Ann. Statist. 40 2452–2482.
  • Bai, Z. D. and Wu, Y. (1997). General $M$-estimation. J. Multivariate Anal. 63 119–135.
  • Bertsekas, D. P. (1999). Nonlinear Programming. Athena Scientific, Belmont, MA.
  • Bickel, P. J. (1975). One-step Huber estimates in the linear model. J. Amer. Statist. Assoc. 70 428–434.
  • Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • Box, G. E. P. (1953). Non-normality and tests on variances. Biometrika 40 318–335.
  • Bradic, J., Fan, J. and Wang, W. (2011). Penalized composite quasi-likelihood for ultrahigh dimensional variable selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 325–349.
  • Clarke, F. H. (1983). Optimization and Nonsmooth Analysis. Wiley, New York.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • Fan, J., Li, Q. and Wang, Y. (2014). Robust estimation of high-dimensional mean regression. Preprint. Available at arXiv:1410.2150.
  • Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928–961.
  • Freedman, D. A. and Diaconis, P. (1982). On inconsistent $M$-estimators. Ann. Statist. 10 454–461.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
  • Hampel, F. R. (1968). Contributions to the theory of robust estimation Ph.D. thesis, Univ. of California, Berkeley.
  • Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A. (1986). Robust Statistics: The Approach Based on Influence Functions. Wiley, New York.
  • He, X. and Shao, Q.-M. (1996). A general Bahadur representation of $M$-estimators and its application to linear regression with nonstochastic designs. Ann. Statist. 24 2608–2630.
  • He, X. and Shao, Q.-M. (2000). On parameters of increasing dimensions. J. Multivariate Anal. 73 120–135.
  • Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Stat. 35 73–101.
  • Huber, P. J. (1973). Robust regression: Asymptotics, conjectures and Monte Carlo. Ann. Statist. 1 799–821.
  • Huber, P. J. (1981). Robust Statistics. Wiley, New York.
  • Lehmann, E. L. and Casella, G. (1998). Theory of Point Estimation. Springer, New York.
  • Li, G., Peng, H. and Zhu, L. (2011). Nonconcave penalized $M$-estimation with a diverging number of parameters. Statist. Sinica 21 391–419.
  • Loh, P. (2016). Supplement to “Statistical consistency and asymptotic normality for high-dimensional robust $M$-estimators.” DOI:10.1214/16-AOS1471SUPP.
  • Loh, P. and Wainwright, M. J. (2014). Support recovery without incoherence: A case for nonconvex regularization. Preprint. Available at arXiv:1412.5632.
  • Loh, P.-L. and Wainwright, M. J. (2015). Regularized $M$-estimators with nonconvexity: Statistical and algorithmic theory for local optima. J. Mach. Learn. Res. 16 559–616.
  • Lozano, A. C. and Meinshausen, N. (2013). Minimum distance estimation for robust high-dimensional regression. Preprint. Available at arXiv:1307.3227.
  • Mammen, E. (1989). Asymptotics with increasing dimension for robust regression with applications to the bootstrap. Ann. Statist. 17 382–400.
  • Maronna, R. A., Martin, R. D. and Yohai, V. J. (2006). Robust Statistics: Theory and Methods. Wiley, Chichester.
  • Mendelson, S. (2014). Learning without concentration for general loss functions. Preprint. Available at arXiv:1410.3192.
  • Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers. Statist. Sci. 27 538–557.
  • Nesterov, Y. (2007). Gradient methods for minimizing composite objective function. CORE Discussion Papers No. 2007076, Université Catholique de Louvain, Center for Operations Research and Econometrics (CORE).
  • Nolan, J. P. (2015). Stable Distributions—Models for Heavy Tailed Data. Birkhauser, Boston.
  • Portnoy, S. (1985). Asymptotic behavior of $M$ estimators of $p$ regression parameters when $p^{2}/n$ is large. II. Normal approximation. Ann. Statist. 13 1403–1417.
  • Raskutti, G., Wainwright, M. J. and Yu, B. (2010). Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11 2241–2259.
  • Ravikumar, P., Wainwright, M. J. and Lafferty, J. D. (2010). High-dimensional Ising model selection using $\ell_{1}$-regularized logistic regression. Ann. Statist. 38 1287–1319.
  • Rousseeuw, P. J. and Leroy, A. M. (2005). Robust Regression and Outlier Detection. Wiley, New York.
  • Rudelson, M. and Zhou, S. (2013). Reconstruction from anisotropic random measurements. IEEE Trans. Inform. Theory 59 3434–3447.
  • Shevlyakov, G., Morgenthaler, S. and Shurygin, A. (2008). Redescending $M$-estimators. J. Statist. Plann. Inference 138 2906–2917.
  • Simpson, D. G., Ruppert, D. and Carroll, R. J. (1992). On one-step GM estimates and stability of inferences in linear regression. J. Amer. Statist. Assoc. 87 439–450.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Tukey, J. W. (1960). A survey of sampling from contaminated distributions. In Contributions to Probability and Statistics 448–485. Stanford Univ. Press, Stanford, CA.
  • Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210–268. Cambridge Univ. Press, Cambridge.
  • Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_{1}$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
  • Welsh, A. H. and Ronchetti, E. (2002). A journey in single steps: Robust one-step $M$-estimation in linear regression. J. Statist. Plann. Inference 103 287–310.
  • Yohai, V. J. (1987). High breakdown-point and high efficiency robust estimates for regression. Ann. Statist. 15 642–656.
  • Yohai, V. J. and Maronna, R. A. (1979). Asymptotic behavior of $M$-estimators for the linear model. Ann. Statist. 7 258–268.
  • Zhang, T. (2010a). Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11 1081–1107.

Supplemental materials

  • Supplement to “Statistical consistency and asymptotic normality for high-dimensional robust $M$-estimators.”. We provide detailed technical proofs for the results stated in the main body of the paper.