Regularization in kernel learning
Shahar Mendelson and Joseph Neeman
Source: Ann. Statist. Volume 38, Number 1
(2010), 526-565.
Abstract
Under mild assumptions on the kernel, we obtain the best known error rates in a regularized learning scenario taking place in the corresponding reproducing kernel Hilbert space (RKHS). The main novelty in the analysis is a proof that one can use a regularization term that grows significantly slower than the standard quadratic growth in the RKHS norm.
First Page:
Show
Hide
Full-text: Access denied (no subscription
detected)
We're sorry, but we are unable to provide
you with the full text of this article because we are not able to identify
you as a subscriber.
If you have a personal subscription to
this journal, then please login. If you are already logged in, then you
may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers
Permanent link to this document: http://projecteuclid.org/euclid.aos/1262271623
Digital Object Identifier: doi:10.1214/09-AOS728
Zentralblatt MATH identifier: 05673107
Mathematical Reviews number (MathSciNet): MR2590050
References
[1] Bartlett, P. L. (2008). Fast rates for estimation error and oracle inequalities for model selection. Econometric Theory 24 545–552.
Mathematical Reviews (MathSciNet): MR2490397
Zentralblatt MATH: 05564007
Digital Object Identifier: doi:10.1017/S0266466608080225
[2] Bartlett, P. L., Bousquet, O. and Mendelson, S. (2005). Local rademacher complexities. Ann. Statist. 33 1497–1537.
Mathematical Reviews (MathSciNet): MR2166554
Digital Object Identifier: doi:10.1214/009053605000000282
Project Euclid: euclid.aos/1123250221
[3] Bartlett, P. L. and Mendelson, S. (2006). Empirical minimization. Probab. Theory Related Fields 135 311–344.
Mathematical Reviews (MathSciNet): MR2240689
Zentralblatt MATH: 1142.62348
Digital Object Identifier: doi:10.1007/s00440-005-0462-3
[4] Bartlett, P. L., Mendelson, S. and Neeman, J. (2009). ℓ1-regularized linear regression: Persistence and oracle inequalities. Submitted.
[5] Blanchard, G., Bousquet, O. and Massart, P. (2008). Statistical performance of support vector machines. Ann. Statist. 36 489–531.
Mathematical Reviews (MathSciNet): MR2396805
Zentralblatt MATH: 1133.62044
Digital Object Identifier: doi:10.1214/009053607000000839
Project Euclid: euclid.aos/1205420509
[6] Birman, M. Š. and Solomyak, M. Z. (1977). Estimates of singular numbers of integral operators. Uspehi Mat. Nauk 32 15–89.
Mathematical Reviews (MathSciNet): MR438186
[7] Caponnetto, A. and de Vito, E. (2007). Optimal rates for regularized least-squares algorithm. Found. Comput. Math. 7 331–368.
Mathematical Reviews (MathSciNet): MR2335249
Digital Object Identifier: doi:10.1007/s10208-006-0196-8
[8] Cucker, F. and Smale, S. (2002). On the mathematical foundations of learning. Bull. Amer. Math. Soc. (N.S.) 39 1–49.
Mathematical Reviews (MathSciNet): MR1864085
Digital Object Identifier: doi:10.1090/S0273-0979-01-00923-5
[9] Cucker, F. and Smale, S. (2002). Best choices for regularization parameters in learning theory: On the Bias-variance problem. Found. Comput. Math. 2 413–428.
Mathematical Reviews (MathSciNet): MR1930945
Digital Object Identifier: doi:10.1007/s102080010030
[10] Cucker, F. and Zhou, D. X. (2007). Learning Theory: An Approximation Theory Viewpoint. Cambridge Univ. Press, Cambridge.
[11] Dudley, R. M. (1999). Uniform Central Limit Theorems. Cambridge Studies in Advanced Mathematics 63. Cambridge Univ. Press, Cambridge.
Mathematical Reviews (MathSciNet): MR1720712
[12] Fernique, X. (1975). Régularité des trajectoires des fonctiones aléatoires Gaussiennes. In Ecole d’Eté de Probabilités de St-Flour 1974. Lecture Notes in Mathematics 480 1–96. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR413238
[13] Giné, E. and Zinn, J. (1984). Some limit theorems for empirical processes. Ann. Probab. 12 929–989.
Mathematical Reviews (MathSciNet): MR757767
Digital Object Identifier: doi:10.1214/aop/1176993138
Project Euclid: euclid.aop/1176993138
[14] Guedon, O., Mendelson, S., Pajor, A. and Tomczak-Jaegermann, N. (2007). Subspaces and orthogonal decompositions generated by bounded orthogonal systems. Positivity 11 269–283.
Mathematical Reviews (MathSciNet): MR2321621
Digital Object Identifier: doi:10.1007/s11117-006-2059-1
[15] Guédon, O. and Rudelson, M. (2007). Lp moments of random vectors via majorizing measures. Adv. Math. 208 798–823.
Mathematical Reviews (MathSciNet): MR2304336
Zentralblatt MATH: 1114.46008
Digital Object Identifier: doi:10.1016/j.aim.2006.03.013
[16] Koltchinskii, V. (2006). Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Statist. 34 2593–2656.
Mathematical Reviews (MathSciNet): MR2329442
Zentralblatt MATH: 1118.62065
Digital Object Identifier: doi:10.1214/009053606000001019
Project Euclid: euclid.aos/1179935055
[17] Konig, H. (1986). Eigenvalue Distribution of Compact Operators. Birkhäuser, Basel.
Mathematical Reviews (MathSciNet): MR889455
[18] Ledoux, M. (2001). The Concentration of Measure Phenomenon. Amer. Math. Soc., Providence, RI.
Mathematical Reviews (MathSciNet): MR1849347
[19] Lee, W. S., Barlett, P. L. and Williamson, R. C. (1996). The importance of convexity in learning with squared loss. IEEE Trans. Inform. Theory 44 1974–1980.
Mathematical Reviews (MathSciNet): MR1664079
Digital Object Identifier: doi:10.1109/18.705577
[20] Massart, P. (2000). About the constants in Talagrand’s concentration inequality for empirical processes. Ann. Probab. 28 863–884.
Mathematical Reviews (MathSciNet): MR1782276
Zentralblatt MATH: 1140.60310
Digital Object Identifier: doi:10.1214/aop/1019160263
Project Euclid: euclid.aop/1019160263
[21] Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Mathematics 1896. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR2319879
Zentralblatt MATH: 1170.60006
[22] Mendelson, S. (2003). Estimating the performance of kernel classes. J. Mach. Learn. Res. 4 759–771.
Mathematical Reviews (MathSciNet): MR2075996
Zentralblatt MATH: 1083.68097
Digital Object Identifier: doi:10.1162/1532443041424337
[23] Mendelson, S., Pajor, A. and Tomczak-Jaegermann, N. (2007). Reconstruction and subgaussian operators in asymptotic geometric analysis. Geom. Funct. Anal. 17 1248–1282.
Mathematical Reviews (MathSciNet): MR2373017
Zentralblatt MATH: 1163.46008
Digital Object Identifier: doi:10.1007/s00039-007-0618-7
[24] Mendelson, S. (2009). Obtaining fast error rates in nonconvex situations. J. Complexity 24 380–397.
Mathematical Reviews (MathSciNet): MR2426759
Zentralblatt MATH: 05303163
Digital Object Identifier: doi:10.1016/j.jco.2007.09.001
[25] Mendelson, S. (2008). On weakly bounded empirical processes. Math. Ann. 340 293–314.
Mathematical Reviews (MathSciNet): MR2368981
Zentralblatt MATH: 1151.60006
Digital Object Identifier: doi:10.1007/s00208-007-0152-9
[26] Pajor, A. and Tomczak-Jaegermann, N. (1985). Remarques sur les nombres d’entropie d’un opérateur et de son transposé. C. R. Acad. Sci. Paris Ser. I Math. 301 743–746.
Mathematical Reviews (MathSciNet): MR817602
[27] Pisier, G. (1989). The Volume of Convex Bodies and Banach Space Geometry. Cambridge Univ. Press, Cambridge.
Mathematical Reviews (MathSciNet): MR1036275
[28] Rudelson, M. (1999). Random vectors in the isotropic position. J. Funct. Anal. 164 60–72.
Mathematical Reviews (MathSciNet): MR1694526
Zentralblatt MATH: 0929.46021
Digital Object Identifier: doi:10.1006/jfan.1998.3384
[29] Steinwart, I. and Scovel, C. (2007). Fast rates for support vector machines using Gaussian kernels. Ann. Statist. 35 575–607.
Mathematical Reviews (MathSciNet): MR2336860
Zentralblatt MATH: 1127.68091
Digital Object Identifier: doi:10.1214/009053606000001226
Project Euclid: euclid.aos/1183667285
[30] Smale, S. and Zhou, D. X. (2003). Estimating the approximation error in learning theory. Anal. Appl. 1 17–41.
Mathematical Reviews (MathSciNet): MR1959283
Zentralblatt MATH: 1079.68089
Digital Object Identifier: doi:10.1142/S0219530503000089
[31] Smale, S. and Zhou, D. X. (2007). Learning theory estimates via integral operators and their approximations. Constr. Approx. 26 153–172.
Mathematical Reviews (MathSciNet): MR2327597
Digital Object Identifier: doi:10.1007/s00365-006-0659-y
[32] Talagrand, M. (1987). Regularity of Gaussian processes. Acta Math. 159 99–149.
Mathematical Reviews (MathSciNet): MR906527
Zentralblatt MATH: 0712.60044
Digital Object Identifier: doi:10.1007/BF02392556
[33] Talagrand, M. (1994). Sharper bounds for Gaussian and empirical processes. Ann. Probab. 22 28–76.
Mathematical Reviews (MathSciNet): MR1258865
Zentralblatt MATH: 0798.60051
Digital Object Identifier: doi:10.1214/aop/1176988847
Project Euclid: euclid.aop/1176988847
[34] Talagrand, M. (2005). The Generic Chaining. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR2133757
[35] Williamson, R. C., Smola, A. J. and Schölkopf, B. (2001). Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators. IEEE Trans. Inform. Theory 47 2516–2532.
Mathematical Reviews (MathSciNet): MR1873936
Digital Object Identifier: doi:10.1109/18.945262
[36] Wu, Q., Ying, Y. and Zhou, D. (2006). Learning rates of least-square regularized regression. Found. Comput. Math. 6 171–192.
Mathematical Reviews (MathSciNet): MR2228738
Digital Object Identifier: doi:10.1007/s10208-004-0155-9
[37] Zhou, D. (2002). The covering number in learning theory. J. Complexity 18 739–767.
Mathematical Reviews (MathSciNet): MR1928805
Zentralblatt MATH: 1016.68044
Digital Object Identifier: doi:10.1006/jcom.2002.0635