The Annals of Statistics

Test for bandedness of high-dimensional covariance matrices and bandwidth estimation

Yumou Qiu and Song Xi Chen

Full-text: Open access

Abstract

Motivated by the latest effort to employ banded matrices to estimate a high-dimensional covariance $\Sigma$, we propose a test for $\Sigma$ being banded with possible diverging bandwidth. The test is adaptive to the “large $p$, small $n$” situations without assuming a specific parametric distribution for the data. We also formulate a consistent estimator for the bandwidth of a banded high-dimensional covariance matrix. The properties of the test and the bandwidth estimator are investigated by theoretical evaluations and simulation studies, as well as an empirical analysis on a protein mass spectroscopy data.

Article information

Source
Ann. Statist., Volume 40, Number 3 (2012), 1285-1314.

Dates
First available in Project Euclid: 10 August 2012

Permanent link to this document
https://projecteuclid.org/euclid.aos/1344610584

Digital Object Identifier
doi:10.1214/12-AOS1002

Mathematical Reviews number (MathSciNet)
MR3015026

Zentralblatt MATH identifier
1257.62064

Subjects
Primary: 62H15: Hypothesis testing
Secondary: 62G10: Hypothesis testing 62G20: Asymptotic properties

Keywords
Banded covariance matrix bandwidth estimation high data dimension large $p$ small $n$ nonparametric

Citation

Qiu, Yumou; Chen, Song Xi. Test for bandedness of high-dimensional covariance matrices and bandwidth estimation. Ann. Statist. 40 (2012), no. 3, 1285--1314. doi:10.1214/12-AOS1002. https://projecteuclid.org/euclid.aos/1344610584


Export citation

References

  • Adam, B. L., Qu, Y., Davis, J. W., Ward, M. D., Clements, M. A., Cazares, L. H., Semmes, O. J., Schellhamm, P. F., Yasui, Y., Feng, Z. and Wright, G. L. W. Jr. (2003). Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy mean. Cancer Research 63 3609–3614.
  • Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley, Hoboken, NJ.
  • Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statist. Sinica 6 311–329.
  • Bai, Z. D. and Silverstein, J. W. (2005). Spectral Analysis of Large Dimensional Random Matrices. Scientific Press, Beijing.
  • Bai, Z. D., Silverstein, J. W. and Yin, Y. Q. (1988). A note on the largest eigenvalue of a large-dimensional sample covariance matrix. J. Multivariate Anal. 26 166–168.
  • Bai, Z. D. and Yin, Y. Q. (1993). Limit of the smallest eigenvalue of a large-dimensional sample covariance matrix. Ann. Probab. 21 1275–1294.
  • Bickel, P. J. and Levina, E. (2008a). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
  • Bickel, P. J. and Levina, E. (2008b). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
  • Billingsley, P. (1995). Probability and Measure, 3rd ed. Wiley, New York.
  • Cai, T. T. and Jiang, T. (2011). Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices. Ann. Statist. 39 1496–1525.
  • Cai, T. T., Zhang, C.-H. and Zhou, H. H. (2010). Optimal rates of convergence for covariance matrix estimation. Ann. Statist. 38 2118–2144.
  • Chen, S. X., Zhang, L.-X. and Zhong, P.-S. (2010). Tests for high-dimensional covariance matrices. J. Amer. Statist. Assoc. 105 810–819.
  • Cleveland, W. and Devlin, S. J. (1988). Locally weighted regression: An approach to regression analysis by local fitting. J. Amer. Statist. Assoc. 83 596–610.
  • El Karoui, N. (2011). On the largest eigenvalue of Wishart matrices with identity covariance when $n$, $p$ and $n/p$ tend to infinity. Unpublished manuscript.
  • Fan, J., Fan, Y. and Lv, J. (2008). High dimensional covariance matrix estimation using a factor model. J. Econometrics 147 186–197.
  • Fan, J. and Gijbels, I. (1996). Local Polynomial Smoothing. Chapman & Hall, London.
  • Huang, J. Z., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93 85–98.
  • Jiang, T. (2004). The asymptotic distributions of the largest entries of sample correlation matrices. Ann. Appl. Probab. 14 865–880.
  • John, S. (1972). The distribution of a statistic used for testing sphericity of normal distributions. Biometrika 59 169–173.
  • Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
  • Ledoit, O. and Wolf, M. (2002). Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Ann. Statist. 30 1081–1102.
  • Levina, E., Rothman, A. and Zhu, J. (2008). Sparse estimation of large covariance matrices via a nested Lasso penalty. Ann. Appl. Stat. 2 245–263.
  • Liu, W.-D., Lin, Z. and Shao, Q.-M. (2008). The asymptotic distribution and Berry–Esseen bound of a new test for independence in high dimension with an application to stochastic optimization. Ann. Appl. Probab. 18 2337–2366.
  • Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley, New York.
  • Nagao, H. (1973). On some test criteria for covariance matrix. Ann. Statist. 1 700–709.
  • Rothman, A. J., Levina, E. and Zhu, J. (2009). Generalized thresholding of large covariance matrices. J. Amer. Statist. Assoc. 104 177–186.
  • Rothman, A. J., Levina, E. and Zhu, J. (2010). A new approach to Cholesky-based covariance regularization in high dimensions. Biometrika 97 539–550.
  • Schott, J. R. (2005). Testing for complete independence in high dimensions. Biometrika 92 951–956.
  • Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 91–108.
  • Wagaman, A. S. and Levina, E. (2009). Discovering sparse covariance structures with the isomap. J. Comput. Graph. Statist. 18 551–572.
  • Wu, W. B. and Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika 90 831–844.