• Bernoulli
  • Volume 17, Number 1 (2011), 211-225.

Estimating conditional quantiles with the help of the pinball loss

Ingo Steinwart and Andreas Christmann

Full-text: Open access


The so-called pinball loss for estimating conditional quantiles is a well-known tool in both statistics and machine learning. So far, however, only little work has been done to quantify the efficiency of this tool for nonparametric approaches. We fill this gap by establishing inequalities that describe how close approximate pinball risk minimizers are to the corresponding conditional quantile. These inequalities, which hold under mild assumptions on the data-generating distribution, are then used to establish so-called variance bounds, which recently turned out to play an important role in the statistical analysis of (regularized) empirical risk minimization approaches. Finally, we use both types of inequalities to establish an oracle inequality for support vector machines that use the pinball loss. The resulting learning rates are min–max optimal under some standard regularity assumptions on the conditional quantile.

Article information

Bernoulli, Volume 17, Number 1 (2011), 211-225.

First available in Project Euclid: 8 February 2011

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

nonparametric regression quantile estimation support vector machines


Steinwart, Ingo; Christmann, Andreas. Estimating conditional quantiles with the help of the pinball loss. Bernoulli 17 (2011), no. 1, 211--225. doi:10.3150/10-BEJ267.

Export citation


  • [1] Bartlett, P.L., Bousquet, O. and Mendelson, S. (2005). Local Rademacher complexities. Ann. Statist. 33 1497–1537.
  • [2] Bartlett, P.L., Jordan, M.I. and McAuliffe, J.D. (2006). Convexity, classification, and risk bounds. J. Amer. Statist. Assoc. 101 138–156.
  • [3] Bauer, H. (2001). Measure and Integration Theory. Berlin: De Gruyter.
  • [4] Bennett, C. and Sharpley, R. (1988). Interpolation of Operators. Boston: Academic Press.
  • [5] Birman, M.Š. and Solomyak, M.Z. (1967). Piecewise-polynomial approximations of functions of the classes Wpα (Russian). Mat. Sb. 73 331–355.
  • [6] Blanchard, G., Bousquet, O. and Massart, P. (2008). Statistical performance of support vector machines. Ann. Statist. 36 489–531.
  • [7] Caponnetto, A. and De Vito, E. (2007). Optimal rates for regularized least squares algorithm. Found. Comput. Math. 7 331–368.
  • [8] Carl, B. and Stephani, I. (1990). Entropy, Compactness and the Approximation of Operators. Cambridge: Cambridge Univ. Press.
  • [9] Christmann, A., Van Messem, A. and Steinwart, I. (2009). On consistency and robustness properties of support vector machines for heavy-tailed distributions. Stat. Interface. 2 311–327.
  • [10] Cucker, F. and Zhou, D.X. (2007). Learning Theory: An Approximation Theory Viewpoint. Cambridge: Cambridge Univ. Press.
  • [11] Edmunds, D.E. and Triebel, H. (1996). Function Spaces, Entropy Numbers, Differential Operators. Cambridge: Cambridge Univ. Press.
  • [12] Hwang, C. and Shim, J. (2005). A simple quantile regression via support vector machine. In Advances in Natural Computation: First International Conference (ICNC) 512 –520. Berlin: Springer.
  • [13] Koenker, R. (2005). Quantile Regression. Cambridge: Cambridge University Press.
  • [14] Mammen, E. and Tsybakov, A. (1999). Smooth discrimination analysis. Ann. Statist. 27 1808–1829.
  • [15] Massart, P. (2000). Some applications of concentration inequalities to statistics. Ann. Fac. Sci. Toulouse, VI. Sr., Math. 9 245–303.
  • [16] Mendelson, S. (2001). Geometric methods in the analysis of Glivenko–Cantelli classes. In Proceedings of the 14th Annual Conference on Computational Learning Theory (D. Helmbold and B. Williamson, eds.) 256–272. New York: Springer.
  • [17] Mendelson, S. (2001). Learning relatively small classes. In Proceedings of the 14th Annual Conference on Computational Learning Theory (D. Helmbold and B. Williamson, eds.) 273–288. New York: Springer.
  • [18] Mendelson, S. and Neeman, J. (2010). Regularization in kernel learning. Ann. Statist. 38 526–565.
  • [19] Rosset, S. (2009). Bi-level path following for cross validated solution of kernel quantile regression. J. Mach. Learn. Res. 10 2473–2505.
  • [20] Schölkopf, B., Smola, A.J., Williamson, R.C. and Bartlett, P.L. (2000). New support vector algorithms. Neural Comput. 12 1207–1245.
  • [21] Steinwart, I. (2007). How to compare different loss functions. Constr. Approx. 26 225–287.
  • [22] Steinwart, I. (2009). Oracle inequalities for SVMs that are based on random entropy numbers. J. Complexity. 25 437–454.
  • [23] Steinwart, I. and Christmann, A. (2008). How SVMs can estimate quantiles and the median. In Advances in Neural Information Processing Systems 20 (J.C. Platt, D. Koller, Y. Singer and S. Roweis, eds.) 305–312. Cambridge, MA: MIT Press.
  • [24] Steinwart, I. and Christmann, A. (2008). Support Vector Machines. New York: Springer.
  • [25] Steinwart, I., Hush, D. and Scovel, C. (2009). Optimal rates for regularized least squares regression. In Proceedings of the 22nd Annual Conference on Learning Theory (S. Dasgupta and A. Klivans, eds.) 79–93. Available at
  • [26] Takeuchi, I., Le, Q.V., Sears, T.D. and Smola, A.J. (2006). Nonparametric quantile estimation. J. Mach. Learn. Res. 7 1231–1264.
  • [27] Temlyakov, V. (2006). Optimal estimators in learning theory. Banach Center Publications, Inst. Math. Polish Academy of Sciences 72 341–366.
  • [28] Tsybakov, A.B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135–166.
  • [29] Yang, Y. and Barron, A. (1999). Information-theoretic determination of minimax rates of convergence. Ann. Statist. 27 1564–1599.