Source: Ann. Statist. Volume 38, Number 1
(2010), 1-50.
We place ourselves in the setting of high-dimensional statistical inference where the number of variables p in a dataset of interest is of the same order of magnitude as the number of observations n.
We consider the spectrum of certain kernel random matrices, in particular n×n matrices whose (i, j)th entry is f(X'iXj/p) or f(‖Xi−Xj‖2/p) where p is the dimension of the data, and Xi are independent data vectors. Here f is assumed to be a locally smooth function.
The study is motivated by questions arising in statistics and computer science where these matrices are used to perform, among other things, nonlinear versions of principal component analysis. Surprisingly, we show that in high-dimensions, and for the models we analyze, the problem becomes essentially linear—which is at odds with heuristics sometimes used to justify the usage of these methods. The analysis also highlights certain peculiarities of models widely studied in random matrix theory and raises some questions about their relevance as tools to model high-dimensional data encountered in practice.
References
[1] Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley, Hoboken, NJ.
[2] Bach, F. R. and Jordan, M. I. (2003). Kernel independent component analysis. J. Mach. Learn. Res. 3 1–48.
[3] Bai, Z. D. (1999). Methodologies in spectral analysis of large-dimensional random matrices, a review. Statist. Sinica 9 611–677.
[4] Bai, Z. D., Miao, B. Q. and Pan, G. M. (2007). On asymptotics of eigenvectors of large sample covariance matrix. Ann. Probab. 35 1532–1572.
[5] Bai, Z. D. and Silverstein, J. W. (1998). No eigenvalues outside the support of the limiting spectral distribution of large-dimensional sample covariance matrices. Ann. Probab. 26 316–345.
[6] Bai, Z. D. and Silverstein, J. W. (1999). Exact separation of eigenvalues of large-dimensional sample covariance matrices. Ann. Probab. 27 1536–1555.
[7] Baik, J., Ben Arous, G. and Péché, S. (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab. 33 1643–1697.
[8] Baik, J. and Silverstein, J. (2006). Eigenvalues of large sample covariance matrices of spiked population models. J. Multivariate Anal. 97 1382–1408.
[9] Belkin, M. and Niyogi, P. (2009). Convergence of Laplacian eigenmaps. Preprint.
[10] Bhatia, R. (1997). Matrix Analysis. Graduate Texts in Mathematics 169. Springer, New York.
[11] Bogomolny, E., Bohigas, O. and Schmit, C. (2003). Spectral properties of distance matrices. J. Phys. A 36 3595–3616.
[12] Bordenave, C. (2008). Eigenvalues of Euclidean random matrices. Random Structures Algorithms 33 515–532. Available at http://arxiv.org/abs/math/0606624.
[13] Boutet de Monvel, A., Khorunzhy, A. and Vasilchuk, V. (1996). Limiting eigenvalue distribution of random matrices with correlated entries. Markov Process. Related Fields 2 607–636.
[14] Burda, Z., Jurkiewicz, J. and Wacław, B. (2005). Spectral moments of correlated Wishart matrices. Phys. Rev. E 71 026111.
[15] Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York.
[16] El Karoui, N. (2003). On the largest eigenvalue of Wishart matrices with identity covariance when n, p and p/n→∞. Available at arXiv:math.ST/0309355.
[17] El Karoui, N. (2007). Tracy–Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. Ann. Probab. 35 663–714.
[18] El Karoui, N. (2008). Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Statist. 36 2717–2756.
[19] El Karoui, N. (2009). Concentration of measure and spectra of random matrices: With applications to correlation matrices, elliptical distributions and beyond. Ann. Appl. Probab. 19 2362–2405.
[20] Forrester, P. J. (1993). The spectrum edge of random matrix ensembles. Nuclear Phys. B 402 709–728.
[21] Geman, S. (1980). A limit theorem for the norm of random matrices. Ann. Probab. 8 252–261.
Mathematical Reviews (MathSciNet):
MR566592
[22] Geronimo, J. S. and Hill, T. P. (2003). Necessary and sufficient condition that the limit of Stieltjes transforms is a Stieltjes transform. J. Approx. Theory 121 54–60.
[23] Gohberg, I., Goldberg, S. and Krupnik, N. (2000). Traces and Determinants of Linear Operators. Operator Theory: Advances and Applications. 116 Birkhäuser, Basel.
[24] Horn, R. A. and Johnson, C. R. (1990). Matrix Analysis. Cambridge Univ. Press, Cambridge.
[25] Horn, R. A. and Johnson, C. R. (1994). Topics in Matrix Analysis. Cambridge Univ. Press, Cambridge.
[26] Johansson, K. (2000). Shape fluctuations and random matrices. Comm. Math. Phys. 209 437–476.
[27] Johnstone, I. (2001). On the distribution of the largest eigenvalue in principal component analysis. Ann. Statist. 29 295–327.
[28] Koltchinskii, V. and Giné, E. (2000). Random matrix approximation of spectra of integral operators. Bernoulli 6 113–167.
[29] Ledoux, M. (2001). The concentration of measure phenomenon. Mathematical Surveys and Monographs 89. Amer. Math. Soc., Providence, RI.
[30] Marčenko, V. A. and Pastur, L. A. (1967). Distribution of eigenvalues in certain sets of random matrices. Mat. Sb. (N.S.) 72 507–536.
Mathematical Reviews (MathSciNet):
MR208649
[31] Paul, D. (2007). Asymptotics of sample eigenstructure for a large-dimensional spiked covariance model. Statist. Sinica 17 1617–1642.
[32] Paul, D. and Silverstein, J. (2009). No eigenvalues outside the support of the limiting empirical spectral distribution of a separable covariance matrix. J. Multivariate Anal. 100 37–57.
[33] Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA.
[34] Schechtman, G. and Zinn, J. (2000). Concentration on the lpn ball. In Geometric Aspects of Functional Analysis. Lecture Notes in Mathematics 1745 245–256. Springer, Berlin.
[35] Schölkopf, B. and Smola, A. J. (2002). Learning with Kernels. MIT Press, Cambridge, MA.
[36] Schölkopf, B., Tsuda, K. and Vert, J. P. (2004). Kernel Methods in Computational Biology. MIT Press, Cambridge, MA.
[37] Silverstein, J. W. (1995). Strong convergence of the empirical distribution of eigenvalues of large-dimensional random matrices. J. Multivariate Anal. 55 331–339.
[38] Tracy, C. and Widom, H. (1994). Level-spacing distribution and the Airy kernel. Comm. Math. Phys. 159 151–174.
[39] Tracy, C. and Widom, H. (1996). On orthogonal and symplectic matrix ensembles. Comm. Math. Phys. 177 727–754.
[40] Tracy, C. and Widom, H. (1998). Correlation functions, cluster functions and spacing distributions for random matrices. J. Stat. Phys. 92 809–835.
[41] Voiculescu, D. (2000). Lectures on free probability theory. In Lectures on Probability Theory and Statistics (Saint-Flour, 1998). Lecture Notes in Mathematics 1738 279–349. Springer, Berlin.
[42] Wachter, K. W. (1978). The strong limits of random matrix spectra for sample matrices of independent elements. Ann. Probab. 6 1–18.
Mathematical Reviews (MathSciNet):
MR467894
[43] Wigner, E. (1955). Characteristic vectors of bordered matrices with infinite dimensions. Ann. of Math. (2) 62 548–564.
Mathematical Reviews (MathSciNet):
MR77805
[44] Williams, C. and Seeger, M. (2000). The effect of the input density distribution on kernel-based classifiers. International Conference on Machine Learning 17 1159–1166.
[45] Yin, Y. Q., Bai, Z. D. and Krishnaiah, P. R. (1988). On the limit of the largest eigenvalue of the large-dimensional sample covariance matrix. Probab. Theory Related Fields 78 509–521.
Mathematical Reviews (MathSciNet):
MR950344