Bernoulli

  • Bernoulli
  • Volume 23, Number 4A (2017), 2466-2532.

Spectral analysis of high-dimensional sample covariance matrices with missing observations

Kamil Jurczak and Angelika Rohde

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

We study high-dimensional sample covariance matrices based on independent random vectors with missing coordinates. The presence of missing observations is common in modern applications such as climate studies or gene expression micro-arrays. A weak approximation on the spectral distribution in the “large dimension $d$ and large sample size $n$” asymptotics is derived for possibly different observation probabilities in the coordinates. The spectral distribution turns out to be strongly influenced by the missingness mechanism. In the null case under the missing at random scenario where each component is observed with the same probability $p$, the limiting spectral distribution is a Marčenko–Pastur law shifted by $(1-p)/p$ to the left. As $d/n\rightarrow y\in(0,1)$, the almost sure convergence of the extremal eigenvalues to the respective boundary points of the support of the limiting spectral distribution is proved, which are explicitly given in terms of $y$ and $p$. Eventually, the sample covariance matrix is positive definite if $p$ is larger than

\[1-(1-\sqrt{y})^{2},\] whereas this is not true any longer if $p$ is smaller than this quantity.

Article information

Source
Bernoulli, Volume 23, Number 4A (2017), 2466-2532.

Dates
Received: October 2015
First available in Project Euclid: 9 May 2017

Permanent link to this document
https://projecteuclid.org/euclid.bj/1494316823

Digital Object Identifier
doi:10.3150/16-BEJ815

Mathematical Reviews number (MathSciNet)
MR3648036

Zentralblatt MATH identifier
06778247

Keywords
almost sure convergence of extremal eigenvalues characterization of positive definiteness limiting spectral distribution sample covariance matrix with missing observations Stieltjes transform

Citation

Jurczak, Kamil; Rohde, Angelika. Spectral analysis of high-dimensional sample covariance matrices with missing observations. Bernoulli 23 (2017), no. 4A, 2466--2532. doi:10.3150/16-BEJ815. https://projecteuclid.org/euclid.bj/1494316823


Export citation

References

  • [1] Bai, Z. and Silverstein, J.W. (2010). Spectral Analysis of Large Dimensional Random Matrices, 2nd ed. Springer Series in Statistics. New York: Springer.
  • [2] Bai, Z.D. and Silverstein, J.W. (1998). No eigenvalues outside the support of the limiting spectral distribution of large-dimensional sample covariance matrices. Ann. Probab. 26 316–345.
  • [3] Bai, Z.D., Silverstein, J.W. and Yin, Y.Q. (1988). A note on the largest eigenvalue of a large-dimensional sample covariance matrix. J. Multivariate Anal. 26 166–168.
  • [4] Bai, Z.D. and Yin, Y.Q. (1993). Limit of the smallest eigenvalue of a large-dimensional sample covariance matrix. Ann. Probab. 21 1275–1294.
  • [5] Butucea, C. and Zgheib, R. (2016). Adaptive test for large covariance matrices with missing observations. Preprint. Available at arXiv:1602.04310.
  • [6] Couillet, R., Debbah, M. and Silverstein, J.W. (2011). A deterministic equivalent for the analysis of correlated MIMO multiple access channels. IEEE Trans. Inform. Theory 57 3493–3514.
  • [7] El Karoui, N. (2008). Spectrum estimation for large dimensional covariance matrices using random matrix theory. Ann. Statist. 36 2757–2790.
  • [8] Geman, S. (1980). A limit theorem for the norm of random matrices. Ann. Probab. 8 252–261.
  • [9] Huber, P. (1974). Some mathematical problems arising in robust statistics. In Proceedings of the International Congress of Mathematicians. Vancouver.
  • [10] Jin, B., Wang, C., Bai, Z.D., Nair, K.K. and Harding, M. (2014). Limiting spectral distribution of a symmetrized auto-cross covariance matrix. Ann. Appl. Probab. 24 1199–1225.
  • [11] Krishnapur, M. (2012). Random matrix theory. Lecture notes.
  • [12] Ledoit, O. and Wolf, M. (2012). Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann. Statist. 40 1024–1060.
  • [13] Li, C.-K. and Mathias, R. (1999). The Lidskii–Mirsky–Wielandt theorem—additive and multiplicative versions. Numer. Math. 81 377–413.
  • [14] Li, Z., Pan, G. and Yao, J. (2015). On singular value distribution of large-dimensional autocovariance matrices. J. Multivariate Anal. 137 119–140.
  • [15] Liu, H., Aue, A. and Paul, D. (2015). On the Marčenko–Pastur law for linear time series. Ann. Statist. 43 675–712.
  • [16] Lounici, K. (2014). High-dimensional covariance matrix estimation with missing observations. Bernoulli 20 1029–1058.
  • [17] Marčenko, V. and Pastur, L. (1967). Distribution of eigenvalues for some sets of random matrices. Math. USSR—Sb. 81 377–413.
  • [18] Nishizawa, A.J. and Inoue, K.T. (2013). Reconstruction of missing data in the sky using iterative harmonic expansion. Preprint. Available at arXiv:1305.0116.
  • [19] Petrov, V.V. (1995). Limit Theorems of Probability Theory. Oxford Studies in Probability 4. New York: The Clarendon Press, Oxford Univ. Press.
  • [20] Sherwood, S.C. (2001). Climate signals from station arrays with missing data, and an application to winds. J. Geophys. Res. 105 29489–29500.
  • [21] Shohat, J.A. and Tamarkin, J.D. (1943). The Problem of Moments. American Mathematical Society Mathematical Surveys 1 New York: Amer. Math. Soc.
  • [22] Silverstein, J.W. (1995). Strong convergence of the empirical distribution of eigenvalues of large-dimensional random matrices. J. Multivariate Anal. 55 331–339.
  • [23] Silverstein, J.W. and Bai, Z.D. (1995). On the empirical distribution of eigenvalues of a class of large-dimensional random matrices. J. Multivariate Anal. 54 175–192.
  • [24] Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D. and Altman, R.B. (2001). Missing value estimation methods for DNA microarrays. Bioinformatics 17 520–525.
  • [25] Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210–268. Cambridge: Cambridge Univ. Press.
  • [26] Wang, C., Jin, B., Bai, Z.D., Nair, K.K. and Harding, M. (2015). Strong limit of the extreme eigenvalues of a symmetrized auto-cross covariance matrix. Ann. Appl. Probab. 25 3624–3683.
  • [27] Wang, L., Aue, A. and Paul, D. (2015). Spectral analysis of linear time series in moderately high dimensions. Bernoulli. To appear.
  • [28] Wang, Q. and Yao, J. (2015). Moment approach for singular values distribution of a large auto-covariance matrix. Ann. Inst. Henri Poincaré Probab. Stat. To appear.
  • [29] Yin, Y.Q., Bai, Z.D. and Krishnaiah, P.R. (1988). On the limit of the largest eigenvalue of the large-dimensional sample covariance matrix. Probab. Theory Related Fields 78 509–521.