## The Annals of Statistics

### Statistical consistency and asymptotic normality for high-dimensional robust $M$-estimators

Po-Ling Loh

#### Abstract

We study theoretical properties of regularized robust $M$-estimators, applicable when data are drawn from a sparse high-dimensional linear model and contaminated by heavy-tailed distributions and/or outliers in the additive errors and covariates. We first establish a form of local statistical consistency for the penalized regression estimators under fairly mild conditions on the error distribution: When the derivative of the loss function is bounded and satisfies a local restricted curvature condition, all stationary points within a constant radius of the true regression vector converge at the minimax rate enjoyed by the Lasso with sub-Gaussian errors. When an appropriate nonconvex regularizer is used in place of an $\ell_{1}$-penalty, we show that such stationary points are in fact unique and equal to the local oracle solution with the correct support; hence, results on asymptotic normality in the low-dimensional case carry over immediately to the high-dimensional setting. This has important implications for the efficiency of regularized nonconvex $M$-estimators when the errors are heavy-tailed. Our analysis of the local curvature of the loss function also has useful consequences for optimization when the robust regression function and/or regularizer is nonconvex and the objective function possesses stationary points outside the local region. We show that as long as a composite gradient descent algorithm is initialized within a constant radius of the true regression vector, successive iterates will converge at a linear rate to a stationary point within the local region. Furthermore, the global optimum of a convex regularized robust regression function may be used to obtain a suitable initialization. The result is a novel two-step procedure that uses a convex $M$-estimator to achieve consistency and a nonconvex $M$-estimator to increase efficiency. We conclude with simulation results that corroborate our theoretical findings.

#### Article information

Source
Ann. Statist. Volume 45, Number 2 (2017), 866-896.

Dates
Revised: April 2016
First available in Project Euclid: 16 May 2017

https://projecteuclid.org/euclid.aos/1494921960

Digital Object Identifier
doi:10.1214/16-AOS1471

Zentralblatt MATH identifier
1371.62023

Subjects
Primary: 62F12: Asymptotic properties of estimators

#### Citation

Loh, Po-Ling. Statistical consistency and asymptotic normality for high-dimensional robust $M$-estimators. Ann. Statist. 45 (2017), no. 2, 866--896. doi:10.1214/16-AOS1471. https://projecteuclid.org/euclid.aos/1494921960

#### References

• Agarwal, A., Negahban, S. and Wainwright, M. J. (2012). Fast global convergence of gradient methods for high-dimensional statistical recovery. Ann. Statist. 40 2452–2482.
• Bai, Z. D. and Wu, Y. (1997). General $M$-estimation. J. Multivariate Anal. 63 119–135.
• Bertsekas, D. P. (1999). Nonlinear Programming. Athena Scientific, Belmont, MA.
• Bickel, P. J. (1975). One-step Huber estimates in the linear model. J. Amer. Statist. Assoc. 70 428–434.
• Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
• Box, G. E. P. (1953). Non-normality and tests on variances. Biometrika 40 318–335.
• Bradic, J., Fan, J. and Wang, W. (2011). Penalized composite quasi-likelihood for ultrahigh dimensional variable selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 325–349.
• Clarke, F. H. (1983). Optimization and Nonsmooth Analysis. Wiley, New York.
• Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
• Fan, J., Li, Q. and Wang, Y. (2014). Robust estimation of high-dimensional mean regression. Preprint. Available at arXiv:1410.2150.
• Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928–961.
• Freedman, D. A. and Diaconis, P. (1982). On inconsistent $M$-estimators. Ann. Statist. 10 454–461.
• Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
• Hampel, F. R. (1968). Contributions to the theory of robust estimation Ph.D. thesis, Univ. of California, Berkeley.
• Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A. (1986). Robust Statistics: The Approach Based on Influence Functions. Wiley, New York.
• He, X. and Shao, Q.-M. (1996). A general Bahadur representation of $M$-estimators and its application to linear regression with nonstochastic designs. Ann. Statist. 24 2608–2630.
• He, X. and Shao, Q.-M. (2000). On parameters of increasing dimensions. J. Multivariate Anal. 73 120–135.
• Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Stat. 35 73–101.
• Huber, P. J. (1973). Robust regression: Asymptotics, conjectures and Monte Carlo. Ann. Statist. 1 799–821.
• Huber, P. J. (1981). Robust Statistics. Wiley, New York.
• Lehmann, E. L. and Casella, G. (1998). Theory of Point Estimation. Springer, New York.
• Li, G., Peng, H. and Zhu, L. (2011). Nonconcave penalized $M$-estimation with a diverging number of parameters. Statist. Sinica 21 391–419.
• Loh, P. (2016). Supplement to “Statistical consistency and asymptotic normality for high-dimensional robust $M$-estimators.” DOI:10.1214/16-AOS1471SUPP.
• Loh, P. and Wainwright, M. J. (2014). Support recovery without incoherence: A case for nonconvex regularization. Preprint. Available at arXiv:1412.5632.
• Loh, P.-L. and Wainwright, M. J. (2015). Regularized $M$-estimators with nonconvexity: Statistical and algorithmic theory for local optima. J. Mach. Learn. Res. 16 559–616.
• Lozano, A. C. and Meinshausen, N. (2013). Minimum distance estimation for robust high-dimensional regression. Preprint. Available at arXiv:1307.3227.
• Mammen, E. (1989). Asymptotics with increasing dimension for robust regression with applications to the bootstrap. Ann. Statist. 17 382–400.
• Maronna, R. A., Martin, R. D. and Yohai, V. J. (2006). Robust Statistics: Theory and Methods. Wiley, Chichester.
• Mendelson, S. (2014). Learning without concentration for general loss functions. Preprint. Available at arXiv:1410.3192.
• Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers. Statist. Sci. 27 538–557.
• Nesterov, Y. (2007). Gradient methods for minimizing composite objective function. CORE Discussion Papers No. 2007076, Université Catholique de Louvain, Center for Operations Research and Econometrics (CORE).
• Nolan, J. P. (2015). Stable Distributions—Models for Heavy Tailed Data. Birkhauser, Boston.
• Portnoy, S. (1985). Asymptotic behavior of $M$ estimators of $p$ regression parameters when $p^{2}/n$ is large. II. Normal approximation. Ann. Statist. 13 1403–1417.
• Raskutti, G., Wainwright, M. J. and Yu, B. (2010). Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11 2241–2259.
• Ravikumar, P., Wainwright, M. J. and Lafferty, J. D. (2010). High-dimensional Ising model selection using $\ell_{1}$-regularized logistic regression. Ann. Statist. 38 1287–1319.
• Rousseeuw, P. J. and Leroy, A. M. (2005). Robust Regression and Outlier Detection. Wiley, New York.
• Rudelson, M. and Zhou, S. (2013). Reconstruction from anisotropic random measurements. IEEE Trans. Inform. Theory 59 3434–3447.
• Shevlyakov, G., Morgenthaler, S. and Shurygin, A. (2008). Redescending $M$-estimators. J. Statist. Plann. Inference 138 2906–2917.
• Simpson, D. G., Ruppert, D. and Carroll, R. J. (1992). On one-step GM estimates and stability of inferences in linear regression. J. Amer. Statist. Assoc. 87 439–450.
• Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
• Tukey, J. W. (1960). A survey of sampling from contaminated distributions. In Contributions to Probability and Statistics 448–485. Stanford Univ. Press, Stanford, CA.
• Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210–268. Cambridge Univ. Press, Cambridge.
• Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_{1}$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
• Welsh, A. H. and Ronchetti, E. (2002). A journey in single steps: Robust one-step $M$-estimation in linear regression. J. Statist. Plann. Inference 103 287–310.
• Yohai, V. J. (1987). High breakdown-point and high efficiency robust estimates for regression. Ann. Statist. 15 642–656.
• Yohai, V. J. and Maronna, R. A. (1979). Asymptotic behavior of $M$-estimators for the linear model. Ann. Statist. 7 258–268.
• Zhang, T. (2010a). Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11 1081–1107.

#### Supplemental materials

• Supplement to “Statistical consistency and asymptotic normality for high-dimensional robust $M$-estimators.”. We provide detailed technical proofs for the results stated in the main body of the paper.