## Statistical Science

### User-Friendly Covariance Estimation for Heavy-Tailed Distributions

#### Abstract

We provide a survey of recent results on covariance estimation for heavy-tailed distributions. By unifying ideas scattered in the literature, we propose user-friendly methods that facilitate practical implementation. Specifically, we introduce elementwise and spectrumwise truncation operators, as well as their $M$-estimator counterparts, to robustify the sample covariance matrix. Different from the classical notion of robustness that is characterized by the breakdown property, we focus on the tail robustness which is evidenced by the connection between nonasymptotic deviation and confidence level. The key insight is that estimators should adapt to the sample size, dimensionality and noise level to achieve optimal tradeoff between bias and robustness. Furthermore, to facilitate practical implementation, we propose data-driven procedures that automatically calibrate the tuning parameters. We demonstrate their applications to a series of structured models in high dimensions, including the bandable and low-rank covariance matrices and sparse precision matrices. Numerical studies lend strong support to the proposed methods.

#### Article information

Source
Statist. Sci., Volume 34, Number 3 (2019), 454-471.

Dates
First available in Project Euclid: 11 October 2019

https://projecteuclid.org/euclid.ss/1570780979

Digital Object Identifier
doi:10.1214/19-STS711

Mathematical Reviews number (MathSciNet)
MR4017523

Zentralblatt MATH identifier
07162132

#### Citation

Ke, Yuan; Minsker, Stanislav; Ren, Zhao; Sun, Qiang; Zhou, Wen-Xin. User-Friendly Covariance Estimation for Heavy-Tailed Distributions. Statist. Sci. 34 (2019), no. 3, 454--471. doi:10.1214/19-STS711. https://projecteuclid.org/euclid.ss/1570780979

#### References

• Avella-Medina, M., Battey, H. S., Fan, J. and Li, Q. (2018). Robust estimation of high-dimensional covariance and precision matrices. Biometrika 105 271–284.
• Bickel, P. J. and Levina, E. (2008a). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
• Bickel, P. J. and Levina, E. (2008b). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
• Brownlees, C., Joly, E. and Lugosi, G. (2015). Empirical risk minimization for heavy-tailed losses. Ann. Statist. 43 2507–2536.
• Butler, R. W., Davies, P. L. and Jhun, M. (1993). Asymptotics for the minimum covariance determinant estimator. Ann. Statist. 21 1385–1400.
• Cai, T., Liu, W. and Luo, X. (2011). A constrained $\ell_{1}$ minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106 594–607.
• Cai, T. T., Ren, Z. and Zhou, H. H. (2016). Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation. Electron. J. Stat. 10 1–59.
• Cai, T. T., Zhang, C.-H. and Zhou, H. H. (2010). Optimal rates of convergence for covariance matrix estimation. Ann. Statist. 38 2118–2144.
• Catoni, O. (2012). Challenging the empirical mean and empirical variance: A deviation study. Ann. Inst. Henri Poincaré Probab. Stat. 48 1148–1185.
• Catoni, O. (2016). PAC-Bayesian bounds for the Gram matrix and least squares regression with a random design. Preprint. Available at arXiv:1603.05229.
• Chen, M., Gao, C. and Ren, Z. (2018). Robust covariance and scatter matrix estimation under Huber’s contamination model. Ann. Statist. 46 1932–1960.
• Chen, X. and Zhou, W.-X. (2019). Robust inference via multiplier bootstrap. Ann. Statist. To appear. Available at arXiv:1903.07208.
• Cherapanamjeri, Y., Flammarion, N. and Bartlett, P. L. (2019). Fast mean estimation with sub-Gaussian rates. Preprint. Available at arXiv:1902.01998.
• Cont, R. (2001). Empirical properties of asset returns: Stylized facts and statistical issues. Quant. Finance 1 223–236.
• Davies, L. (1992). The asymptotics of Rousseeuw’s minimum volume ellipsoid estimator. Ann. Statist. 20 1828–1843.
• Devroye, L., Lerasle, M., Lugosi, G. and Oliveira, R. I. (2016). Sub-Gaussian mean estimators. Ann. Statist. 44 2695–2725.
• Eklund, A., Nichols, T. E. and Knutsson, H. (2016). Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. Proc. Natl. Acad. Sci. USA 113 7900–7905.
• Fan, J., Liao, Y. and Liu, H. (2016). An overview of the estimation of large covariance and precision matrices. Econom. J. 19 C1–C32.
• Fan, J., Liu, H., Sun, Q. and Zhang, T. (2018). I-LAMM for sparse learning: Simultaneous control of algorithmic complexity and statistical error. Ann. Statist. 46 814–841.
• Fan, J., Sun, Q., Zhou, W.-X. and Zhu, Z. (2019). Principal component analysis for big data. Wiley StatsRef: Statistics Reference Online. To appear. DOI:10.1002/9781118445112.stat08122.
• Hall, P., Kay, J. W. and Titterington, D. M. (1990). Asymptotically optimal difference-based estimation of variance in nonparametric regression. Biometrika 77 521–528.
• Hampel, F. R. (1971). A general qualitative definition of robustness. Ann. Math. Stat. 42 1887–1896.
• Hopkins, S. B. (2018). Mean estimation with sub-Gaussian rates in polynomial time. Preprint. Available at arXiv:1809.07425.
• Hsu, D. and Sabato, S. (2016). Loss minimization and parameter estimation with heavy tails. J. Mach. Learn. Res. 17 Paper No. 18, 40.
• Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Stat. 35 73–101.
• Hubert, M., Rousseeuw, P. J. and Van Aelst, S. (2008). High-breakdown robust multivariate methods. Statist. Sci. 23 92–119.
• Ke, Y., Minsker, S., Ren, Z., Sun, Q. and Zhou, W.-X (2019). Supplement to “User-Friendly Covariance Estimation for Heavy-Tailed Distributions.” DOI:10.1214/19-STS711SUPP.
• Lepski, O. V. and Spokoiny, V. G. (1997). Optimal pointwise adaptive methods in nonparametric estimation. Ann. Statist. 25 2512–2546.
• Lepskiĭ, O. V. (1990). A problem of adaptive estimation in Gaussian white noise. Theory Probab. Appl. 35 454–466.
• Liu, R. Y. (1990). On a notion of data depth based on random simplices. Ann. Statist. 18 405–414.
• Liu, Y. and Ren, Z. (2018). Minimax estimation of large precision matrices with bandable Cholesky factor. Preprint. Available at arXiv:1712.09483.
• Liu, L., Hawkins, D. M., Ghosh, S. and Young, S. S. (2003). Robust singular value decomposition analysis of microarray data. Proc. Natl. Acad. Sci. USA 100 13167–13172.
• Lounici, K. (2014). High-dimensional covariance matrix estimation with missing observations. Bernoulli 20 1029–1058.
• Lugosi, G. and Mendelson, S. (2019). Sub-Gaussian estimators of the mean of a random vector. Ann. Statist. 47 783–794.
• Maronna, R. A. (1976). Robust $M$-estimators of multivariate location and scatter. Ann. Statist. 4 51–67.
• Mendelson, S. and Zhivotovskiy, N. (2018). Robust covariance estimation under $L_{4}$–$L_{2}$ norm equivalence. Preprint. Available at arXiv:1809.10462.
• Minsker, S. (2015). Geometric median and robust estimation in Banach spaces. Bernoulli 21 2308–2335.
• Minsker, S. (2018). Sub-Gaussian estimators of the mean of a random matrix with heavy-tailed entries. Ann. Statist. 46 2871–2903.
• Minsker, S. and Strawn, N. (2017). Distributed statistical estimation and rates of convergence in normal approximation. Preprint. Available at arXiv:1704.02658.
• Minsker, S. and Wei, X. (2018). Robust modifications of $U$-statistics and applications to covariance estimation problems. Preprint. Available at arXiv:1801.05565.
• Mitra, R. and Zhang, C.-H. (2014). Multivariate analysis of nonparametric estimates of large correlation matrices. Preprint. Available at arXiv:1403.6195.
• Mizera, I. (2002). On depth and deep points: A calculus. Ann. Statist. 30 1681–1736.
• Nemirovsky, A. S. and Yudin, D. B. (1983). Problem Complexity and Method Efficiency in Optimization. A Wiley-Interscience Publication. Wiley, New York.
• Portnoy, S. and He, X. (2000). A robust journey in the new millennium. J. Amer. Statist. Assoc. 95 1331–1335.
• Purdom, E. and Holmes, S. P. (2005). Error distribution for gene expression data. Stat. Appl. Genet. Mol. Biol. 4 Art. 16.
• Rice, J. (1984). Bandwidth choice for nonparametric regression. Ann. Statist. 12 1215–1230.
• Rousseeuw, P. J. (1984). Least median of squares regression. J. Amer. Statist. Assoc. 79 871–880.
• Rousseeuw, P. and Yohai, V. (1984). Robust regression by means of S-estimators. In Robust and Nonlinear Time Series Analysis (Heidelberg, 1983). Lect. Notes Stat. 26 256–272. Springer, New York.
• Salibian-Barrera, M. and Zamar, R. H. (2002). Bootstrapping robust estimates of regression. Ann. Statist. 30 556–582.
• Sun, Q., Zhou, W.-X. and Fan, J. (2019). Adaptive Huber regression. J. Amer. Statist. Assoc. DOI:10.1080/01621459.2018.1543124.
• Sun, Q., Tan, K. M., Liu, H. and Zhang, T. (2018). Graphical nonconvex optimization via an adaptive convex relaxation. In Proceedings of the 35th International Conference on Machine Learning 80 4810–4817.
• Tukey, J. W. (1975). Mathematics and the picturing of data. In Proceedings of the International Congress of Mathematicians, Vancouver, 1975, vol. 2.
• Tyler, D. E. (1987). A distribution-free $M$-estimator of multivariate scatter. Ann. Statist. 15 234–251.
• Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210–268. Cambridge Univ. Press, Cambridge.
• Wang, L., Zheng, C., Zhou, W. and Zhou, W.-X. (2018). A new principle for tuning-free Huber regression. Technical Report.
• Yohai, V. J. (1987). High breakdown-point and high efficiency robust estimates for regression. Ann. Statist. 15 642–656.
• Zhang, T., Cheng, X. and Singer, A. (2016). Marčenko–Pastur law for Tyler’s $M$-estimator. J. Multivariate Anal. 149 114–123.
• Zhang, T. and Zou, H. (2014). Sparse precision matrix estimation via lasso penalized D-trace loss. Biometrika 101 103–120.
• Zuo, Y. and Serfling, R. (2000). General notions of statistical depth function. Ann. Statist. 28 461–482.

#### Supplemental materials

• Supplement to “User-Friendly Covariance Estimation for Heavy-Tailed Distributions”. In this supplement, we provide proofs of all the theoretical results in the main text. In addition, we investigate robust covariance estimation and inference under factor models, which might be of independent interest.