The Annals of Statistics

The spectrum of kernel random matrices

Noureddine El Karoui
Source: Ann. Statist. Volume 38, Number 1 (2010), 1-50.

Abstract

We place ourselves in the setting of high-dimensional statistical inference where the number of variables p in a dataset of interest is of the same order of magnitude as the number of observations n.

We consider the spectrum of certain kernel random matrices, in particular n×n matrices whose (i, j)th entry is f(X'iXj/p) or f(‖XiXj2/p) where p is the dimension of the data, and Xi are independent data vectors. Here f is assumed to be a locally smooth function.

The study is motivated by questions arising in statistics and computer science where these matrices are used to perform, among other things, nonlinear versions of principal component analysis. Surprisingly, we show that in high-dimensions, and for the models we analyze, the problem becomes essentially linear—which is at odds with heuristics sometimes used to justify the usage of these methods. The analysis also highlights certain peculiarities of models widely studied in random matrix theory and raises some questions about their relevance as tools to model high-dimensional data encountered in practice.

First Page: Show Hide
Primary Subjects: 62H10
Secondary Subjects: 60F99
Full-text: Access denied (no subscription detected)
We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1262271608
Digital Object Identifier: doi:10.1214/08-AOS648
Zentralblatt MATH identifier: 1181.62078
Mathematical Reviews number (MathSciNet): MR2589315

References

[1] Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley, Hoboken, NJ.
Mathematical Reviews (MathSciNet): MR1990662
[2] Bach, F. R. and Jordan, M. I. (2003). Kernel independent component analysis. J. Mach. Learn. Res. 3 1–48.
Mathematical Reviews (MathSciNet): MR1966051
Zentralblatt MATH: 1088.68689
Digital Object Identifier: doi:10.1162/153244303768966085
[3] Bai, Z. D. (1999). Methodologies in spectral analysis of large-dimensional random matrices, a review. Statist. Sinica 9 611–677.
Mathematical Reviews (MathSciNet): MR1711663
Zentralblatt MATH: 0949.60077
[4] Bai, Z. D., Miao, B. Q. and Pan, G. M. (2007). On asymptotics of eigenvectors of large sample covariance matrix. Ann. Probab. 35 1532–1572.
Mathematical Reviews (MathSciNet): MR2330979
Zentralblatt MATH: 1162.15012
Digital Object Identifier: doi:10.1214/009117906000001079
Project Euclid: euclid.aop/1181334252
[5] Bai, Z. D. and Silverstein, J. W. (1998). No eigenvalues outside the support of the limiting spectral distribution of large-dimensional sample covariance matrices. Ann. Probab. 26 316–345.
Mathematical Reviews (MathSciNet): MR1617051
Zentralblatt MATH: 0937.60017
Digital Object Identifier: doi:10.1214/aop/1022855421
Project Euclid: euclid.aop/1022855421
[6] Bai, Z. D. and Silverstein, J. W. (1999). Exact separation of eigenvalues of large-dimensional sample covariance matrices. Ann. Probab. 27 1536–1555.
Mathematical Reviews (MathSciNet): MR1733159
Zentralblatt MATH: 0964.60041
Digital Object Identifier: doi:10.1214/aop/1022677458
Project Euclid: euclid.aop/1022677458
[7] Baik, J., Ben Arous, G. and Péché, S. (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab. 33 1643–1697.
Mathematical Reviews (MathSciNet): MR2165575
Zentralblatt MATH: 1086.15022
Digital Object Identifier: doi:10.1214/009117905000000233
Project Euclid: euclid.aop/1127395869
[8] Baik, J. and Silverstein, J. (2006). Eigenvalues of large sample covariance matrices of spiked population models. J. Multivariate Anal. 97 1382–1408.
Mathematical Reviews (MathSciNet): MR2279680
Zentralblatt MATH: 05060652
Digital Object Identifier: doi:10.1016/j.jmva.2005.08.003
[9] Belkin, M. and Niyogi, P. (2009). Convergence of Laplacian eigenmaps. Preprint.
[10] Bhatia, R. (1997). Matrix Analysis. Graduate Texts in Mathematics 169. Springer, New York.
Mathematical Reviews (MathSciNet): MR1477662
[11] Bogomolny, E., Bohigas, O. and Schmit, C. (2003). Spectral properties of distance matrices. J. Phys. A 36 3595–3616.
Mathematical Reviews (MathSciNet): MR1986436
Zentralblatt MATH: 1057.15027
Digital Object Identifier: doi:10.1088/0305-4470/36/12/341
[12] Bordenave, C. (2008). Eigenvalues of Euclidean random matrices. Random Structures Algorithms 33 515–532. Available at http://arxiv.org/abs/math/0606624.
Mathematical Reviews (MathSciNet): MR2462254
[13] Boutet de Monvel, A., Khorunzhy, A. and Vasilchuk, V. (1996). Limiting eigenvalue distribution of random matrices with correlated entries. Markov Process. Related Fields 2 607–636.
Mathematical Reviews (MathSciNet): MR1431189
Zentralblatt MATH: 0884.15018
[14] Burda, Z., Jurkiewicz, J. and Wacław, B. (2005). Spectral moments of correlated Wishart matrices. Phys. Rev. E 71 026111.
Mathematical Reviews (MathSciNet): MR2139960
Digital Object Identifier: doi:10.1103/PhysRevE.71.026111
[15] Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1239641
Zentralblatt MATH: 0799.62002
[16] El Karoui, N. (2003). On the largest eigenvalue of Wishart matrices with identity covariance when n, p and p/n→∞. Available at arXiv:math.ST/0309355.
[17] El Karoui, N. (2007). Tracy–Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. Ann. Probab. 35 663–714.
[18] El Karoui, N. (2008). Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Statist. 36 2717–2756.
Mathematical Reviews (MathSciNet): MR2485011
Zentralblatt MATH: 05503374
Digital Object Identifier: doi:10.1214/07-AOS559
Project Euclid: euclid.aos/1231165183
[19] El Karoui, N. (2009). Concentration of measure and spectra of random matrices: With applications to correlation matrices, elliptical distributions and beyond. Ann. Appl. Probab. 19 2362–2405.
Mathematical Reviews (MathSciNet): MR2588248
Zentralblatt MATH: 05656822
Digital Object Identifier: doi:10.1214/08-AAP548
Project Euclid: euclid.aoap/1259158775
[20] Forrester, P. J. (1993). The spectrum edge of random matrix ensembles. Nuclear Phys. B 402 709–728.
Mathematical Reviews (MathSciNet): MR1236195
Zentralblatt MATH: 1043.82538
Digital Object Identifier: doi:10.1016/0550-3213(93)90126-A
[21] Geman, S. (1980). A limit theorem for the norm of random matrices. Ann. Probab. 8 252–261.
Mathematical Reviews (MathSciNet): MR566592
Zentralblatt MATH: 0428.60039
Digital Object Identifier: doi:10.1214/aop/1176994775
Project Euclid: euclid.aop/1176994775
[22] Geronimo, J. S. and Hill, T. P. (2003). Necessary and sufficient condition that the limit of Stieltjes transforms is a Stieltjes transform. J. Approx. Theory 121 54–60.
Mathematical Reviews (MathSciNet): MR1962995
Zentralblatt MATH: 1030.44003
Digital Object Identifier: doi:10.1016/S0021-9045(02)00042-4
[23] Gohberg, I., Goldberg, S. and Krupnik, N. (2000). Traces and Determinants of Linear Operators. Operator Theory: Advances and Applications. 116 Birkhäuser, Basel.
Mathematical Reviews (MathSciNet): MR1744872
[24] Horn, R. A. and Johnson, C. R. (1990). Matrix Analysis. Cambridge Univ. Press, Cambridge.
Mathematical Reviews (MathSciNet): MR1084815
[25] Horn, R. A. and Johnson, C. R. (1994). Topics in Matrix Analysis. Cambridge Univ. Press, Cambridge.
Mathematical Reviews (MathSciNet): MR1288752
[26] Johansson, K. (2000). Shape fluctuations and random matrices. Comm. Math. Phys. 209 437–476.
Mathematical Reviews (MathSciNet): MR1737991
Zentralblatt MATH: 0969.15008
Digital Object Identifier: doi:10.1007/s002200050027
[27] Johnstone, I. (2001). On the distribution of the largest eigenvalue in principal component analysis. Ann. Statist. 29 295–327.
Mathematical Reviews (MathSciNet): MR1863961
Zentralblatt MATH: 1016.62078
Digital Object Identifier: doi:10.1214/aos/1009210544
Project Euclid: euclid.aos/1009210544
[28] Koltchinskii, V. and Giné, E. (2000). Random matrix approximation of spectra of integral operators. Bernoulli 6 113–167.
Mathematical Reviews (MathSciNet): MR1781185
Digital Object Identifier: doi:10.2307/3318636
Project Euclid: euclid.bj/1082665383
[29] Ledoux, M. (2001). The concentration of measure phenomenon. Mathematical Surveys and Monographs 89. Amer. Math. Soc., Providence, RI.
Mathematical Reviews (MathSciNet): MR1849347
[30] Marčenko, V. A. and Pastur, L. A. (1967). Distribution of eigenvalues in certain sets of random matrices. Mat. Sb. (N.S.) 72 507–536.
Mathematical Reviews (MathSciNet): MR208649
[31] Paul, D. (2007). Asymptotics of sample eigenstructure for a large-dimensional spiked covariance model. Statist. Sinica 17 1617–1642.
Mathematical Reviews (MathSciNet): MR2399865
Zentralblatt MATH: 1134.62029
[32] Paul, D. and Silverstein, J. (2009). No eigenvalues outside the support of the limiting empirical spectral distribution of a separable covariance matrix. J. Multivariate Anal. 100 37–57.
Mathematical Reviews (MathSciNet): MR2460475
Zentralblatt MATH: 1154.60320
Digital Object Identifier: doi:10.1016/j.jmva.2008.03.010
[33] Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA.
Mathematical Reviews (MathSciNet): MR2514435
Zentralblatt MATH: 1177.68165
[34] Schechtman, G. and Zinn, J. (2000). Concentration on the lpn ball. In Geometric Aspects of Functional Analysis. Lecture Notes in Mathematics 1745 245–256. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR1797971
[35] Schölkopf, B. and Smola, A. J. (2002). Learning with Kernels. MIT Press, Cambridge, MA.
[36] Schölkopf, B., Tsuda, K. and Vert, J. P. (2004). Kernel Methods in Computational Biology. MIT Press, Cambridge, MA.
[37] Silverstein, J. W. (1995). Strong convergence of the empirical distribution of eigenvalues of large-dimensional random matrices. J. Multivariate Anal. 55 331–339.
Mathematical Reviews (MathSciNet): MR1370408
Zentralblatt MATH: 0851.62015
Digital Object Identifier: doi:10.1006/jmva.1995.1083
[38] Tracy, C. and Widom, H. (1994). Level-spacing distribution and the Airy kernel. Comm. Math. Phys. 159 151–174.
Mathematical Reviews (MathSciNet): MR1257246
Zentralblatt MATH: 0789.35152
Digital Object Identifier: doi:10.1007/BF02100489
Project Euclid: euclid.cmp/1104254495
[39] Tracy, C. and Widom, H. (1996). On orthogonal and symplectic matrix ensembles. Comm. Math. Phys. 177 727–754.
Mathematical Reviews (MathSciNet): MR1385083
Zentralblatt MATH: 0851.60101
Digital Object Identifier: doi:10.1007/BF02099545
Project Euclid: euclid.cmp/1104286442
[40] Tracy, C. and Widom, H. (1998). Correlation functions, cluster functions and spacing distributions for random matrices. J. Stat. Phys. 92 809–835.
Mathematical Reviews (MathSciNet): MR1657844
Zentralblatt MATH: 0942.60099
Digital Object Identifier: doi:10.1023/A:1023084324803
[41] Voiculescu, D. (2000). Lectures on free probability theory. In Lectures on Probability Theory and Statistics (Saint-Flour, 1998). Lecture Notes in Mathematics 1738 279–349. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR1775641
[42] Wachter, K. W. (1978). The strong limits of random matrix spectra for sample matrices of independent elements. Ann. Probab. 6 1–18.
Mathematical Reviews (MathSciNet): MR467894
Digital Object Identifier: doi:10.1214/aop/1176995607
[43] Wigner, E. (1955). Characteristic vectors of bordered matrices with infinite dimensions. Ann. of Math. (2) 62 548–564.
Mathematical Reviews (MathSciNet): MR77805
Digital Object Identifier: doi:10.2307/1970079
[44] Williams, C. and Seeger, M. (2000). The effect of the input density distribution on kernel-based classifiers. International Conference on Machine Learning 17 1159–1166.
[45] Yin, Y. Q., Bai, Z. D. and Krishnaiah, P. R. (1988). On the limit of the largest eigenvalue of the large-dimensional sample covariance matrix. Probab. Theory Related Fields 78 509–521.
Mathematical Reviews (MathSciNet): MR950344
Zentralblatt MATH: 0627.62022
Digital Object Identifier: doi:10.1007/BF00353874

2012 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics