The Annals of Statistics

Ranking and Empirical Minimization of U-statistics

Stéphan Clémençon, Gábor Lugosi, and Nicolas Vayatis

Full-text: Open access

Abstract

The problem of ranking/ordering instances, instead of simply classifying them, has recently gained much attention in machine learning. In this paper we formulate the ranking problem in a rigorous statistical framework. The goal is to learn a ranking rule for deciding, among two instances, which one is “better,” with minimum ranking risk. Since the natural estimates of the risk are of the form of a U-statistic, results of the theory of U-processes are required for investigating the consistency of empirical risk minimizers. We establish, in particular, a tail inequality for degenerate U-processes, and apply it for showing that fast rates of convergence may be achieved under specific noise assumptions, just like in classification. Convex risk minimization methods are also studied.

Article information

Source
Ann. Statist., Volume 36, Number 2 (2008), 844-874.

Dates
First available in Project Euclid: 13 March 2008

Permanent link to this document
https://projecteuclid.org/euclid.aos/1205420521

Digital Object Identifier
doi:10.1214/009052607000000910

Mathematical Reviews number (MathSciNet)
MR2396817

Zentralblatt MATH identifier
1181.68160

Subjects
Primary: 68Q32: Computational learning theory [See also 68T05] 60E15: Inequalities; stochastic orderings 60C05: Combinatorial probability 60G25: Prediction theory [See also 62M20]

Keywords
Statistical learning theory of classification VC classes fast rates convex risk minimization moment inequalities U-processes

Citation

Clémençon, Stéphan; Lugosi, Gábor; Vayatis, Nicolas. Ranking and Empirical Minimization of U -statistics. Ann. Statist. 36 (2008), no. 2, 844--874. doi:10.1214/009052607000000910. https://projecteuclid.org/euclid.aos/1205420521


Export citation

References

  • Adamczak, R. (2007). Moment inequalities for U-statistics. Ann. Probab. 34 2288–2314.
  • Agarwal, S., Graepel, T., Herbrich, R., Har-Peled, S. and Roth, D. (2005). Generalization bounds for the area under the ROC curve. J. Machine Learning Research 6 393–425.
  • Arcones, M. A. and Giné, E. (1993). Limit theorems for U-processes. Ann. Probab 21 1494–1542.
  • Arcones, M. A. and Giné, E. (1994). U-processes indexed by Vapnik–Cervonenkis classes of functions with applications to asymptotics and bootstrap of U-statistics with estimated parameters. Stochastic Process. Appl. 52 17–38.
  • Bartlett, P. L., Jordan, M. I. and McAuliffe, J. D. (2006). Convexity, classification, and risk bounds. J. Amer. Statist. Assoc. 101 138–156.
  • Bartlett, P. L. and Mendelson, S. (2006). Empirical minimization. Probab. Theory Related Fields 135 311–334.
  • Blanchard, G., Lugosi, G. and Vayatis, N. (2003). On the rates of convergence of regularized boosting classifiers. J. Machine Learning Research 4 861–894.
  • Boucheron, S., Bousquet, O. and Lugosi, G. (2005). Theory of classification: A survey of some recent advances. ESAIM Probab. Statist. 9 323–375.
  • Boucheron, S., Bousquet, O., Lugosi, G. and Massart, P. (2005). Moment inequalities for functions of independent random variables. Ann. Probab. 33 514–560.
  • Breiman, L. (2004). Population theory for boosting ensembles. Ann. Statist. 32 1–11.
  • Cao, Y., Xu, J., Liu, T. Y., Li, H., Huang, Y. and Hon, H. W. (2006). Adapting ranking SVM to document retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 186–193. ACM Press, Seattle, WA.
  • Cortes, C. and Mohri, M. (2004). AUC optimization vs. error rate minimization. In Advances in Neural Information Processing Systems 16 (S. Thrun, L. Saul and B. Schölkopf, eds.) 313–320. MIT Press.
  • Cossock, D. and Zhang, T. (2006). Subset ranking uning regression. Proceedings of the 19th Annual Conference on Learning Theory COLT 2006 (G. Lugosi and H.U. Simon, eds.) 605–619. Lecture Notes in Comput. Sci. 4005. Springer, Berlin.
  • Cucker, F. and Smale, S. (2002). On the mathematical foundations of learning. Bull. Amer. Math. Soc. 39 1–49.
  • de la Peña, V. H. and Giné, E. (1999). Decoupling: From Dependence to Independence. Springer, New York.
  • Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
  • Freund, Y., Iyer, R., Schapire, R. E. and Singer, Y. (2004). An efficient boosting algorithm for combining preferences. J. Machine Learning Research 4 933–969.
  • Giné, E., Latała, R. and Zinn, J. (2000). Exponential and moment inequalities for U-statistics. In High Dimensional Probability II. Progress Probab. 47 13–38. Birkhäuser, Boston.
  • Giné, E. and Zinn, J. (1984). Some limit theorems for empirical processes. Ann. Probab. 12 929–989.
  • Green, D. M. and Swets, J. A. (1966). Signal Detection Theory and Psychophysics. Wiley, New York.
  • Haussler, D. (1995). Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik–Chervonenkis dimension. J. Combin. Theory Ser. A 69 217–232.
  • Hoeffding, W. (1948). A class of statistics with asymptotically normal distributions. Ann. Math. Statist. 19 293–325.
  • Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13–30.
  • Houdré, C. and Reynaud-Bouret, P. (2003). Exponential inequalities, with constants, for U-statistics of order two. In Stochastic Inequalities and Applications. Progr. Probab. 56 55–69. Birkhäuser, Basel.
  • Jiang, W. (2004). Process consistency for Adaboost (with discussion). Ann. Statist. 32 13–29.
  • Koltchinskii, V. (2006). Local Rademacher complexities and oracle inequalities in risk minimization (with discussion). Ann. Statist. 34 2593–2706.
  • Koltchinskii, V. and Panchenko, D. (2002). Empirical margin distribution and bounding the generalization error of combined classifiers. Ann. Statist. 30 1–50.
  • Ledoux, M. (1997). On Talagrand’s deviation inequalities for product measures. ESAIM Probab. Statist. 1 63–87.
  • Lugosi, G. (2002). Pattern classification and learning theory. In Principles of Nonparametric Learning (L. Györfi, ed.) 5–62. Springer, Wienna.
  • Lugosi, G. and Vayatis, N. (2004). On the Bayes-risk consistency of regularized boosting methods (with discussion). Ann. Statist. 32 30–55.
  • Major, P. (2006). An estimate of the supremum of a nice class of stochastic integrals and U-statistics. Probab. Theory Related Fields 134 489–537.
  • Massart, P. (2007). Concentration Inequalities and Model Selection. Springer, Berlin.
  • Massart, P. and Nédélec, E. (2006). Risk bounds for statistical learning. Ann. Statist. 34 2326–2366.
  • McDiarmid, C. (1989). On the method of bounded differences. In Surveys in Combinatorics 1989 148–188. Cambridge Univ. Press.
  • Rudin, C. (2006). Ranking with a p-norm push. In Proceedings of COLT 2006 (P. Auer and R. Meir, eds.). Lecture Notes in Comput. Sci. 4005 589–604. Springer, Berlin.
  • Scovel, S. and Steinwart, I. (2005). Fast rates for support vector machines. Learning Theory 279–294. Lecture Notes in Comput. Sci. 3559. Springer, Berlin.
  • Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics. Wiley, New York.
  • Smale, S. and Zhou, D. X. (2003). Estimating the approximation error in learning theory. Anal. Appl. 1 17–41.
  • Steinwart, I. (2001). On the influence of the kernel on the consistency of support vector machines. J. Machine Learning Research 2 67–93.
  • Stute, W. (1991). Conditional U-statistics. Ann. Probab. 19 812–825.
  • Stute, W. (1994). Universally consistent conditional U-statistics. Ann. Statist. 22 460–473.
  • Talagrand, M. (1996). New concentration inequalities in product spaces. Invent. Math. 126 505–563.
  • Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135–166.
  • Usunier, N., Truong, V., Amini, M. and Gallinari, P. (2005). Ranking with unlabeled data: A first study. In Proceedings of NIPS’05 Workshop on Learning to Rank. Whistler, Canada.
  • Vapnik, V. N. and Chervonenkis, A. Ya. (1974). Theory of Pattern Recognition. Nauka, Moscow. (In Russian.) [German translation Theorie der Zeichenerkennung (1979) Akademie Verlag, Berlin.]
  • Vittaut, J. N. and Gallinari, P. (2006). Machine learning ranking for structured information retrieval. Advances in Information Retrieval. Lecture Notes in Comput. Sci. 3936 338–349. Springer, Berlin.
  • Vu, H. T. and Gallinari, P. (2005). Using RankBoost to compare retrieval systems. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management, CIKM’05 309–310. ACM Press, New York.
  • Zhang, T. (2004). Statistical behavior and consistency of classification methods based on convex risk minimization (with discussion). Ann. Statist. 32 56–85.