Bernoulli

  • Bernoulli
  • Volume 13, Number 3 (2007), 799-819.

Consistency and robustness of kernel-based regression in convex risk minimization

Andreas Christmann and Ingo Steinwart

Full-text: Open access

Abstract

We investigate statistical properties for a broad class of modern kernel-based regression (KBR) methods. These kernel methods were developed during the last decade and are inspired by convex risk minimization in infinite-dimensional Hilbert spaces. One leading example is support vector regression. We first describe the relationship between the loss function L of the KBR method and the tail of the response variable. We then establish the L-risk consistency for KBR which gives the mathematical justification for the statement that these methods are able to “learn”. Then we consider robustness properties of such kernel methods. In particular, our results allow us to choose the loss function and the kernel to obtain computationally tractable and consistent KBR methods that have bounded influence functions. Furthermore, bounds for the bias and for the sensitivity curve, which is a finite sample version of the influence function, are developed, and the relationship between KBR and classical M estimators is discussed.

Article information

Source
Bernoulli, Volume 13, Number 3 (2007), 799-819.

Dates
First available in Project Euclid: 7 August 2007

Permanent link to this document
https://projecteuclid.org/euclid.bj/1186503487

Digital Object Identifier
doi:10.3150/07-BEJ5102

Mathematical Reviews number (MathSciNet)
MR2348751

Zentralblatt MATH identifier
1129.62031

Keywords
consistency convex risk minimization influence function nonparametric regression robustness sensitivity curve support vector regression

Citation

Christmann, Andreas; Steinwart, Ingo. Consistency and robustness of kernel-based regression in convex risk minimization. Bernoulli 13 (2007), no. 3, 799--819. doi:10.3150/07-BEJ5102. https://projecteuclid.org/euclid.bj/1186503487


Export citation

References

  • Akerkar, R. (1999). Nonlinear Functional Analysis. New Dehli: Narosa Publishing House.
  • Brown, A. and Pearcy, C. (1977). Introduction to Operator Theory I. New York: Springer.
  • Cheney, W. (2001). Analysis for Applied Mathematics. New York: Springer.
  • Christmann, A. (2004). An approach to model complex high-dimensional insurance data. Allg. Statist. Archiv 88 375–396.
  • Christmann, A. and Steinwart, I. (2004). On robust properties of convex risk minimization methods for pattern recognition. J. Mach. Learn. Res. 5 1007–1034.
  • Christmann, A. and Steinwart, I. (2006). Consistency of kernel-based quantile regression. Preprint.
  • Davies, P. (1993). Aspects of robust linear regression. Ann. Statist. 21 1843–1899.
  • DeVito, E., Rosasco, L., Caponnetto, A., Piana, M. and Verri, A. (2004). Some properties of regularized kernel methods. J. Mach. Learn. Res. 5 1363–1390.
  • Diestel, J. and Uhl, J. (1977). Vector Measures. Providence: American Mathematical Society.
  • Diestel, J., Jarchow, H. and Tonge, A. (1995). Absolutely Summing Operators. Cambridge University Press.
  • Dudley, R. (2002). Real Analysis and Probability. Cambridge University Press.
  • Györfi, L., Kohler, M., Krzyzak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. New York: Springer.
  • Hampel, F. (1974). The influence curve and its role in robust estimation. J. Amer. Statist. Assoc. 69 383–393.
  • Hampel, F., Ronchetti, E., Rousseeuw, P. and Stahel, W. (1986). Robust Statistics. The Approach Based on Influence Functions. New York: Wiley.
  • Hoffmann-Jørgensen, J. (1974). Sums of independent Banach space valued random variables. Studia Math. 52 159–186.
  • Huber, P. (1981). Robust Statistics. New York: Wiley.
  • Phelps, R. (1986). Convex Functions, Monotone Operators and Differentiability. Lecture Notes in Math. 1364. Berlin: Springer.
  • Poggio, T., Rifkin, R., Mukherjee, S. and Niyogi, P. (2004). General conditions for predictivity in learning theory. Nature 428 419–422.
  • Rousseeuw, P.J. (1984). Least median of squares regression. J. Amer. Statist. Assoc. 79 871–880.
  • Schölkopf, B. and Smola, A. (2002). Learning with Kernels. MIT Press.
  • Steinwart, I. (2001). On the influence of the kernel on the consistency of support vector machines. J. Mach. Learn. Res. 2 67–93.
  • Steinwart, I. (2004). Sparseness of support vector machines. J. Mach. Learn. Res. 4 1071–1105.
  • Steinwart, I. (2005). Consistency of support vector machines and other regularized kernel classifiers. IEEE Trans. Inform. Theory 51 128–142.
  • Steinwart, I. (2006). How to compare different loss functions and their risks. Constr. Approx. To appear.
  • Steinwart, I., Hush, D. and Scovel, C. (2006). Function classes that approximate the Bayes risk. In Proceedings of the 19th Annual Conference on Learning Theory, COLT 2006 79–93. Lecture Notes in Comput. Sci. 4005. Berlin: Springer.
  • Suykens, J., Gestel, T.V., Brabanter, J.D., Moor, B.D. and Vandewalle, J. (2002). Least Squares Support Vector Machines. Singapore: World Scientific.
  • Tukey, J. (1977). Exploratory Data Analysis. Reading, MA: Addison-Wesley.
  • Vapnik, V. (1998). Statistical Learning Theory. New York: Wiley.
  • Wahba, G. (1990). Spline Models for Observational Data. Series in Applied Mathematics 59. Philadelphia: SIAM.
  • Zhang, T. (2001). Convergence of large margin separable linear classification. In Advances in Neural Information Processing Systems 13 357–363.