Electronic Journal of Statistics

Kernel ridge vs. principal component regression: Minimax bounds and the qualification of regularization operators

Lee H. Dicker, Dean P. Foster, and Daniel Hsu

Full-text: Open access

Abstract

Regularization is an essential element of virtually all kernel methods for nonparametric regression problems. A critical factor in the effectiveness of a given kernel method is the type of regularization that is employed. This article compares and contrasts members from a general class of regularization techniques, which notably includes ridge regression and principal component regression. We derive an explicit finite-sample risk bound for regularization-based estimators that simultaneously accounts for (i) the structure of the ambient function space, (ii) the regularity of the true regression function, and (iii) the adaptability (or qualification) of the regularization. A simple consequence of this upper bound is that the risk of the regularization-based estimators matches the minimax rate in a variety of settings. The general bound also illustrates how some regularization techniques are more adaptable than others to favorable regularity properties that the true regression function may possess. This, in particular, demonstrates a striking difference between kernel ridge regression and kernel principal component regression. Our theoretical results are supported by numerical experiments.

Article information

Source
Electron. J. Statist. Volume 11, Number 1 (2017), 1022-1047.

Dates
Received: August 2016
First available in Project Euclid: 30 March 2017

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1490860815

Digital Object Identifier
doi:10.1214/17-EJS1258

Zentralblatt MATH identifier
1362.62087

Subjects
Primary: 62G08: Nonparametric regression

Keywords
Nonparametric regression reproducing kernel Hilbert space

Rights
Creative Commons Attribution 4.0 International License.

Citation

Dicker, Lee H.; Foster, Dean P.; Hsu, Daniel. Kernel ridge vs. principal component regression: Minimax bounds and the qualification of regularization operators. Electron. J. Statist. 11 (2017), no. 1, 1022--1047. doi:10.1214/17-EJS1258. https://projecteuclid.org/euclid.ejs/1490860815


Export citation

References

  • Aronszajn, N. (1950). Theory of reproducing kernels., T. A. Math. Soc. 68 337–404.
  • Bauer, F., Pereverzev, S. and Rosasco, L. (2007). On regularization algorithms in learning theory., J. Complexity 23 52–72.
  • Blanchard, G. and Mücke, N. (2016). Optimal rates for regularization of statistical inverse learning problems., arXiv preprint arXiv:1604.04054.
  • Caponnetto, A. and De Vito, E. (2007). Optimal rates for the regularized least-squares algorithm., Found. Comput. Math. 7 331–368.
  • Caponnetto, A. and Yao, Y. (2010). Cross-validation based adaptation for regularization operators in learning theory., Anal. Appl. 8 161–183.
  • Carmeli, C., De Vito, E. and Toigo, A. (2006). Vector valued reproducing kernel Hilbert spaces of integrable functions and Mercer theorem., Anal. Appl. 4 377–408.
  • Davis, C. and Kahan, W. M. (1970). The rotation of eigenvectors by a perturbation. III., SIAM J. Numer. Anal. 7 1–46.
  • Dhillon, P. S., Foster, D. P., Kakade, S. M. and Ungar, L. H. (2013). A risk comparison of ordinary least squares vs ridge regression., J. Mach. Learn. Res. 14 1505–1511.
  • Dicker, L. H. (2016). Ridge regression and asymptotic minimax estimation over spheres of growing dimension., Bernoulli 22 1–37.
  • Engl, H. W., Hanke, M. and Neubauer, A. (1996)., Regularization of Inverse Problems. Mathematics and Its Applications, Vol. 375, Springer.
  • Guo, Z. C., Lin, S. B. and Zhou, D. X. (2016). Learning theory of distributed spectral algorithms., Preprint.
  • Györfi, L., Kohler, M., Krzyzak, A. and Walk, H. (2002)., A Distribution Free Theory of Nonparametric Regression. Springer.
  • Hsu, D., Kakade, S. M. and Zhang, T. (2014). Random design analysis of ridge regression., Found. Comput. Math. 14 569–600.
  • Koltchinskii, V. (2006). Local Rademacher complexities and oracle inequalities in risk minimization., Ann. Stat. 34 2593–2656.
  • Lin, S. B., Guo, X. and Zhou, D. X. (2016). Distributed learning with regularized least squares., arXiv preprint arXiv:1608.03339.
  • Lo Gerfo, L., Rosasco, L., Odone, F., De Vito, E. and Verri, A. (2008). Spectral algorithms for supervised learning., Neural Comput. 7 1873–1897.
  • Mathé, P. (2005). Saturation of regularization methods for linear ill-posed problems in Hilbert spaces., SIAM J. Numer. Anal. 42 968–973.
  • Minsker, S. (2011). On some extensions of Bernstein’s inequality for self-adjoint operators., arXiv preprint arXiv:1112.5448.
  • Neubauer, A. (1997). On converse and saturation results for Tikhonov regularization of linear ill-posed problems., SIAM J. Numer. Anal. 34 517–527.
  • Pinsker, M. S. (1980). Optimal filtration of square-integrable signals in Gaussian noise., Probl. Inf. Transm. 16 52–68.
  • Reed, M. and Simon, B. (1980)., Methods of Modern Mathematical Physics; Volume 1, Functional Analysis. Academic Press.
  • Rosasco, L., De Vito, E. and Verri, A. (2005). Spectral methods for regularization in learning theory. Tech. Rep. DISI-TR-05-18, Universitá degli Studi di Genova, Italy.
  • Sonnenburg, S. (2008)., Machine Learning for Genomic Sequence Analysis. Ph.D. thesis, Fraunhofer Institute FIRST. Supervised by K.-R. Müller and G. Rätsch.
  • Steinwart, I., Hush, D. and Scovel, C. (2009). Optimal rates for regularized least squares regression. In, Conference on Learning Theory.
  • Tropp, J. A. (2015). An introduction to matrix concentration inequalities., arXiv preprint arXiv:1501.01571.
  • Wasserman, L. (2006)., All of Nonparametric Statistics. Springer.
  • Zhang, T. (2005). Learning bounds for kernel regression using effective data dimensionality., Neural Comput. 17 2077–2098.
  • Zhang, Y., Duchi, J. C. and Wainwright, M. J. (2013). Divide and conquer kernel ridge regression. In, Conference on Learning Theory.