The Annals of Statistics

On information plus noise kernel random matrices

Noureddine El Karoui

Full-text: Open access

Abstract

Kernel random matrices have attracted a lot of interest in recent years, from both practical and theoretical standpoints. Most of the theoretical work so far has focused on the case were the data is sampled from a low-dimensional structure. Very recently, the first results concerning kernel random matrices with high-dimensional input data were obtained, in a setting where the data was sampled from a genuinely high-dimensional structure—similar to standard assumptions in random matrix theory.

In this paper, we consider the case where the data is of the type “information + noise.” In other words, each observation is the sum of two independent elements: one sampled from a “low-dimensional” structure, the signal part of the data, the other being high-dimensional noise, normalized to not overwhelm but still affect the signal. We consider two types of noise, spherical and elliptical.

In the spherical setting, we show that the spectral properties of kernel random matrices can be understood from a new kernel matrix, computed only from the signal part of the data, but using (in general) a slightly different kernel. The Gaussian kernel has some special properties in this setting.

The elliptical setting, which is important from a robustness standpoint, is less prone to easy interpretation.

Article information

Source
Ann. Statist., Volume 38, Number 5 (2010), 3191-3216.

Dates
First available in Project Euclid: 13 September 2010

Permanent link to this document
https://projecteuclid.org/euclid.aos/1284391762

Digital Object Identifier
doi:10.1214/10-AOS801

Mathematical Reviews number (MathSciNet)
MR2722468

Zentralblatt MATH identifier
1200.62056

Subjects
Primary: 62H10: Distribution of statistics
Secondary: 60F99: None of the above, but in this section

Keywords
Kernel matrices multivariate statistical analysis high-dimensional inference random matrix theory machine learning concentration of measure

Citation

El Karoui, Noureddine. On information plus noise kernel random matrices. Ann. Statist. 38 (2010), no. 5, 3191--3216. doi:10.1214/10-AOS801. https://projecteuclid.org/euclid.aos/1284391762


Export citation

References

  • Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley, Hoboken, NJ.
  • Bach, F. R. and Jordan, M. I. (2003). Kernel independent component analysis. J. Mach. Learn. Res. 3 1–48.
  • Bai, Z. D. (1999). Methodologies in spectral analysis of large-dimensional random matrices, a review. Statist. Sinica 9 611–677.
  • Belkin, M. and Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15 1373–1396. Available at http://www.mitpressjournals.org/doi/abs/10.1162/089976603321780317.
  • Belkin, M. and Niyogi, P. (2008). Towards a theoretical foundation for Laplacian-based manifold methods. J. Comput. System Sci. 74 1289–1308.
  • Bhatia, R. (1997). Matrix Analysis. Graduate Texts in Mathematics 169. Springer, New York.
  • Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York.
  • El Karoui, N. (2008). Operator norm consistent estimation of large dimensional sparse covariance matrices. Ann. Statist. 36 2717–2756.
  • El Karoui, N. (2009). Concentration of measure and spectra of random matrices: Applications to correlation matrices, elliptical distributions and beyond. Ann. Appl. Probab. 19 2362–2405.
  • El Karoui, N. (2010). The spectrum of kernel random matrices. Ann. Statist. 38 1–50.
  • Izenman, A. J. (2008). Modern Multivariate Statistical Techniques. Springer, New York.
  • Johnstone, I. (2001). On the distribution of the largest eigenvalue in principal component analysis. Ann. Statist. 29 295–327.
  • Johnstone, I. M. (2007). High dimensional statistical inference and random matrices. In International Congress of Mathematicians I 307–333. Eur. Math. Soc., Zürich.
  • Koltchinskii, V. and Giné, E. (2000). Random matrix approximation of spectra of integral operators. Bernoulli 6 113–167.
  • Ledoux, M. (2001). The Concentration of Measure Phenomenon. Mathematical Surveys and Monographs 89. Amer. Math. Soc., Providence, RI.
  • Marčenko, V. A. and Pastur, L. A. (1967). Distribution of eigenvalues in certain sets of random matrices. Mat. Sb. (N.S.) 72 507–536.
  • Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA.
  • Schechtman, G. and Zinn, J. (2000). Concentration on the lpn ball. In Geometric Aspects of Functional Analysis. Lecture Notes in Math. 1745 245–256. Springer, Berlin.
  • Schölkopf, B. and Smola, A. J. (2002). Learning with Kernels. MIT Press, Cambridge, MA.
  • Stewart, G. W. and Sun, J. G. (1990). Matrix Perturbation Theory. Academic Press, Boston, MA.
  • van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Univ. Press, Cambridge.
  • von Luxburg, U., Belkin, M. and Bousquet, O. (2008). Consistency of spectral clustering. Ann. Statist. 36 555–586. Available at http://dx.doi.org/10.1214/009053607000000640.
  • Williams, C. and Seeger, M. (2000). The effect of the input density distribution on kernel-based classifiers. International Conference on Machine Learning 17 1159–1166.
  • Zwald, L., Bousquet, O. and Blanchard, G. (2004). Statistical properties of kernel principal component analysis. In Learning Theory. Lecture Notes in Computer Science 3120 594–608. Springer, Berlin.