Statistical Science
- Statist. Sci.
- Volume 22, Number 4 (2007), 477-505.
Boosting Algorithms: Regularization, Prediction and Model Fitting
Peter Bühlmann and Torsten Hothorn
Full-text: Open access
Abstract
We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and corresponding Akaike or Bayesian information criteria, particularly useful for regularization and variable selection in high-dimensional covariate spaces, are discussed as well.
The practical aspects of boosting procedures for fitting statistical models are illustrated by means of the dedicated open-source software package mboost. This package implements functions which can be used for model fitting, prediction and variable selection. It is flexible, allowing for the implementation of new boosting algorithms optimizing user-specified loss functions.
Article information
Source
Statist. Sci., Volume 22, Number 4 (2007), 477-505.
Dates
First available in Project Euclid: 7 April 2008
Permanent link to this document
https://projecteuclid.org/euclid.ss/1207580163
Digital Object Identifier
doi:10.1214/07-STS242
Mathematical Reviews number (MathSciNet)
MR2420454
Zentralblatt MATH identifier
1246.62163
Keywords
Generalized linear models generalized additive models gradient boosting survival analysis variable selection software
Citation
Bühlmann, Peter; Hothorn, Torsten. Boosting Algorithms: Regularization, Prediction and Model Fitting. Statist. Sci. 22 (2007), no. 4, 477--505. doi:10.1214/07-STS242. https://projecteuclid.org/euclid.ss/1207580163
References
- [1] Amit, Y. and Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Computation 9 1545–1588.
- [2] Audrino, F. and Barone-Adesi, G. (2005). Functional gradient descent for financial time series with an application to the measurement of market risk. J. Banking and Finance 29 959–977.
- [3] Audrino, F. and Barone-Adesi, G. (2005). A multivariate FGD technique to improve VaR computation in equity markets. Comput. Management Sci. 2 87–106.
- [4] Audrino, F. and Bühlmann, P. (2003). Volatility estimation with functional gradient descent for very high-dimensional financial time series. J. Comput. Finance 6 65–89.
- [5] Bartlett, P. (2003). Prediction algorithms: Complexity, concentration and convexity. In Proceedings of the 13th IFAC Symp. on System Identification.
- [6] Bartlett, P. L., Jordan, M. and McAuliffe, J. (2006). Convexity, classification, and risk bounds. J. Amer. Statist. Assoc. 101 138–156.Mathematical Reviews (MathSciNet): MR2268032
Digital Object Identifier: doi:10.1198/016214505000000907
Zentralblatt MATH: 1118.62330 - [7] Bartlett, P. and Traskin, M. (2007). AdaBoost is consistent. J. Mach. Learn. Res. 8 2347–2368.Mathematical Reviews (MathSciNet): MR2353835
- [8] Benner, A. (2002). Application of “aggregated classifiers” in survival time studies. In Proceedings in Computational Statistics (COMPSTAT) (W. Härdle and B. Rönz, eds.) 171–176. Physica-Verlag, Heidelberg.Mathematical Reviews (MathSciNet): MR1973489
- [9] Binder, H. (2006). GAMBoost: Generalized additive models by likelihood based boosting. R package version 0.9-3. Available at http://CRAN.R-project.org.
- [10] Bissantz, N., Hohage, T., Munk, A. and Ruymgaart, F. (2007). Convergence rates of general regularization methods for statistical inverse problems and applications. SIAM J. Numer. Anal. 45 2610–2636.Mathematical Reviews (MathSciNet): MR2361904
Digital Object Identifier: doi:10.1137/060651884
Zentralblatt MATH: 05485711 - [11] Blake, C. L. and Merz, C. J. (1998). UCI repository of machine learning databases. Available at http://www.ics.uci.edu/~mlearn/MLRepository.html.
- [12] Blanchard, G., Lugosi, G. and Vayatis, N. (2003). On the rate of convergence of regularized boosting classifiers. J. Machine Learning Research 4 861–894.
- [13] Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics 37 373–384.Mathematical Reviews (MathSciNet): MR1365720
Digital Object Identifier: doi:10.2307/1269730
JSTOR: links.jstor.org
Zentralblatt MATH: 0862.62059 - [14] Breiman, L. (1996). Bagging predictors. Machine Learning 24 123–140.
- [15] Breiman, L. (1998). Arcing classifiers (with discussion). Ann. Statist. 26 801–849.Mathematical Reviews (MathSciNet): MR1635406
Digital Object Identifier: doi:10.1214/aos/1024691079
Project Euclid: euclid.aos/1024691079
Zentralblatt MATH: 0934.62064 - [16] Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation 11 1493–1517.
- [17] Breiman, L. (2001). Random forests. Machine Learning 45 5–32.
- [18] Bühlmann, P. (2006). Boosting for high-dimensional linear models. Ann. Statist. 34 559–583.
- [19] Bühlmann, P. (2007). Twin boosting: Improved feature selection and prediction. Technical report, ETH Zürich. Available at ftp://ftp.stat.math.ethz.ch/Research-Reports/Other-Manuscripts/buhlmann/TwinBoosting1.pdf.
- [20] Bühlmann, P. and Lutz, R. (2006). Boosting algorithms: With an application to bootstrapping multivariate time series. In The Frontiers in Statistics (J. Fan and H. Koul, eds.) 209–230. Imperial College Press, London.
- [21] Bühlmann, P. and Yu, B. (2000). Discussion on “Additive logistic regression: A statistical view,” by J. Friedman, T. Hastie and R. Tibshirani. Ann. Statist. 28 377–386.
- [22] Bühlmann, P. and Yu, B. (2003). Boosting with the L2 loss: Regression and classification. J. Amer. Statist. Assoc. 98 324–339.
- [23] Bühlmann, P. and Yu, B. (2006). Sparse boosting. J. Machine Learning Research 7 1001–1024.
- [24] Buja, A., Stuetzle, W. and Shen, Y. (2005). Loss functions for binary class probability estimation: Structure and applications. Technical report, Univ. Washington. Available at http://www.stat.washington.edu/wxs/Learning-papers/paper-proper-scoring.pdf.
- [25] Dettling, M. (2004). BagBoosting for tumor classification with gene expression data. Bioinformatics 20 3583–3593.
- [26] Dettling, M. and Bühlmann, P. (2003). Boosting for tumor classification with gene expression data. Bioinformatics 19 1061–1069.
- [27] DiMarzio, M. and Taylor, C. (2008). On boosting kernel regression. J. Statist. Plann. Inference. To appear.Mathematical Reviews (MathSciNet): MR2432380
Zentralblatt MATH: 1182.62091
Digital Object Identifier: doi:10.1016/j.jspi.2007.10.005 - [28] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407–499.Mathematical Reviews (MathSciNet): MR2060166
Digital Object Identifier: doi:10.1214/009053604000000067
Project Euclid: euclid.aos/1083178935
Zentralblatt MATH: 1091.62054 - [29] Freund, Y. and Schapire, R. (1995). A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the Second European Conference on Computational Learning Theory. Springer, Berlin.
- [30] Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA.
- [31] Freund, Y. and Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119–139.Mathematical Reviews (MathSciNet): MR1473055
Digital Object Identifier: doi:10.1006/jcss.1997.1504
Zentralblatt MATH: 0880.68103 - [32] Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. Ann. Statist. 29 1189–1232.Mathematical Reviews (MathSciNet): MR1873328
Digital Object Identifier: doi:10.1214/aos/1013203451
Project Euclid: euclid.aos/1013203451
Zentralblatt MATH: 1043.62034 - [33] Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337–407.Mathematical Reviews (MathSciNet): MR1790002
Digital Object Identifier: doi:10.1214/aos/1016218223
Project Euclid: euclid.aos/1016218223
Zentralblatt MATH: 1106.62323 - [34] Garcia, A. L., Wagner, K., Hothorn, T., Koebnick, C., Zunft, H. J. and Trippo, U. (2005). Improved prediction of body fat by measuring skinfold thickness, circumferences, and bone breadths. Obesity Research 13 626–634.
- [35] Gentleman, R. C., Carey, V. J., Bates, D. M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, M., Iacus, S., Irizarry, R., Leisch, F., Li, C., Mächler, M., Rossini, A. J., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J. Y. and Zhang, J. (2004). Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology 5 R80.
- [36] Green, P. and Silverman, B. (1994). Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Chapman and Hall, New York.
- [37] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional predictor selection and the virtue of over-parametrization. Bernoulli 10 971–988.
- [38] Hansen, M. and Yu, B. (2001). Model selection and minimum description length principle. J. Amer. Statist. Assoc. 96 746–774.Mathematical Reviews (MathSciNet): MR1939352
Digital Object Identifier: doi:10.1198/016214501753168398
JSTOR: links.jstor.org
Zentralblatt MATH: 1017.62004 - [39] Hastie, T. and Efron, B. (2004). Lars: Least angle regression, lasso and forward stagewise. R package version 0.9-7. Available at http://CRAN.R-project.org.
- [40] Hastie, T. and Tibshirani, R. (1986). Generalized additive models (with discussion). Statist. Sci. 1 297–318.Mathematical Reviews (MathSciNet): MR858512
Digital Object Identifier: doi:10.1214/ss/1177013604
Project Euclid: euclid.ss/1177013604 - [41] Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Chapman and Hall, London.
- [42] Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning; Data Mining, Inference and Prediction. Springer, New York.
- [43] Hothorn, T. and Bühlmann, P. (2007). Mboost: Model-based boosting. R package version 0.5-8. Available at http://CRAN.R-project.org/.
- [44] Hothorn, T. and Bühlmann, P. (2006). Model-based boosting in high dimensions. Bioinformatics 22 2828–2829.
- [45] Hothorn, T., Bühlmann, P., Dudoit, S., Molinaro, A. and van der Laan, M. (2006). Survival ensembles. Biostatistics 7 355–373.
- [46] Hothorn, T., Hornik, K. and Zeileis, A. (2006). Party: A laboratory for recursive part(y)itioning. R package version 0.9-11. Available at http://CRAN.R-project.org/.
- [47] Hothorn, T., Hornik, K. and Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. J. Comput. Graph. Statist. 15 651–674.Mathematical Reviews (MathSciNet): MR2291267
Digital Object Identifier: doi:10.1198/106186006X133933 - [48] Huang, J., Ma, S. and Zhang, C.-H. (2008). Adaptive Lasso for sparse high-dimensional regression. Statist. Sinica. To appear.
- [49] Hurvich, C., Simonoff, J. and Tsai, C.-L. (1998). Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. J. Roy. Statist. Soc. Ser. B 60 271–293.Mathematical Reviews (MathSciNet): MR1616041
Digital Object Identifier: doi:10.1111/1467-9868.00125
JSTOR: links.jstor.org
Zentralblatt MATH: 0909.62039 - [50] Iyer, R., Lewis, D., Schapire, R., Singer, Y. and Singhal, A. (2000). Boosting for document routing. In Proceedings of CIKM-00, 9th ACM Int. Conf. on Information and Knowledge Management (A. Agah, J. Callan and E. Rundensteiner, eds.). ACM Press, New York.
- [51] Jiang, W. (2004). Process consistency for AdaBoost (with discussion). Ann. Statist. 32 13–29, 85–134.Mathematical Reviews (MathSciNet): MR2050999
Digital Object Identifier: doi:10.1214/aos/1079120128
Project Euclid: euclid.aos/1079120128
Zentralblatt MATH: 1105.62316 - [52] Kearns, M. and Valiant, L. (1994). Cryptographic limitations on learning Boolean formulae and finite automata. J. Assoc. Comput. Machinery 41 67–95.
- [53] Koltchinskii, V. and Panchenko, D. (2002). Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statist. 30 1–50.
- [54] Leitenstorfer, F. and Tutz, G. (2006). Smoothing with curvature constraints based on boosting techniques. In Proceedings in Computational Statistics (COMPSTAT) (A. Rizzi and M. Vichi, eds.). Physica-Verlag, Heidelberg.Mathematical Reviews (MathSciNet): MR2347200
- [55] Leitenstorfer, F. and Tutz, G. (2007). Generalized monotonic regression based on B-splines with an application to air pollution data. Biostatistics 8 654–673.
- [56] Leitenstorfer, F. and Tutz, G. (2007). Knot selection by boosting techniques. Comput. Statist. Data Anal. 51 4605–4621.Mathematical Reviews (MathSciNet): MR2364468
- [57] Lozano, A., Kulkarni, S. and Schapire, R. (2006). Convergence and consistency of regularized boosting algorithms with stationary β-mixing observations. In Advances in Neural Information Processing Systems (Y. Weiss, B. Schölkopf and J. Platt, eds.) 18. MIT Press.
- [58] Lugosi, G. and Vayatis, N. (2004). On the Bayes-risk consistency of regularized boosting methods (with discussion). Ann. Statist. 32 30–55, 85–134.
- [59] Lutz, R. and Bühlmann, P. (2006). Boosting for high-multivariate responses in high-dimensional linear regression. Statist. Sinica 16 471–494.
- [60] Mallat, S. and Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing 41 3397–3415.
- [61] Mannor, S., Meir, R. and Zhang, T. (2003). Greedy algorithms for classification–consistency, convergence rates, and adaptivity. J. Machine Learning Research 4 713–741.
- [62] Mason, L., Baxter, J., Bartlett, P. and Frean, M. (2000). Functional gradient techniques for combining hypotheses. In Advances in Large Margin Classifiers (A. Smola, P. Bartlett, B. Schölkopf and D. Schuurmans, eds.) 221–246. MIT Press, Cambridge.Mathematical Reviews (MathSciNet): MR1820960
- [63] McCaffrey, D. F., Ridgeway, G. and Morral, A. R. G. (2004). Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychological Methods 9 403–425.
- [64] Mease, D., Wyner, A. and Buja, A. (2007). Cost-weighted boosting with jittering and over/under-sampling: JOUS-boost. J. Machine Learning Research 8 409–439.
- [65] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
- [66] Meir, R. and Rätsch, G. (2003). An introduction to boosting and leveraging. In Advanced Lectures on Machine Learning (S. Mendelson and A. Smola, eds.). Springer, Berlin.
- [67] Osborne, M., Presnell, B. and Turlach, B. (2000). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389–403.Mathematical Reviews (MathSciNet): MR1773265
Digital Object Identifier: doi:10.1093/imanum/20.3.389
Zentralblatt MATH: 0962.65036 - [68] Park, M.-Y. and Hastie, T. (2007). An L1 regularization-path algorithm for generalized linear models. J. Roy. Statist. Soc. Ser. B 69 659–677.Mathematical Reviews (MathSciNet): MR2370074
Digital Object Identifier: doi:10.1111/j.1467-9868.2007.00607.x - [69] R Development Core Team (2006). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Available at http://www.R-project.org.
- [70] Rätsch, G., Onoda, T. and Müller, K. (2001). Soft margins for AdaBoost. Machine Learning 42 287–320.
- [71] Ridgeway, G. (1999). The state of boosting. Comput. Sci. Statistics 31 172–181.
- [72] Ridgeway, G. (2000). Discussion on “Additive logistic regression: A statistical view of boosting,” by J. Friedman, T. Hastie, R. Tibshirani. Ann. Statist. 28 393–400.Mathematical Reviews (MathSciNet): MR1790002
Digital Object Identifier: doi:10.1214/aos/1016218223
Project Euclid: euclid.aos/1016218223
Zentralblatt MATH: 1106.62323 - [73] Ridgeway, G. (2002). Looking for lumps: Boosting and bagging for density estimation. Comput. Statist. Data Anal. 38 379–392.Mathematical Reviews (MathSciNet): MR1884870
- [74] Ridgeway, G. (2006). Gbm: Generalized boosted regression models. R package version 1.5-7. Available at http://www.i-pensieri.com/gregr/gbm.shtml.
- [75] Schapire, R. (1990). The strength of weak learnability. Machine Learning 5 197–227.
- [76] Schapire, R. (2002). The boosting approach to machine learning: An overview. Nonlinear Estimation and Classification. Lecture Notes in Statist. 171 149–171. Springer, New York.Mathematical Reviews (MathSciNet): MR2005788
- [77] Schapire, R., Freund, Y., Bartlett, P. and Lee, W. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651–1686.Mathematical Reviews (MathSciNet): MR1673273
Digital Object Identifier: doi:10.1214/aos/1024691352
Project Euclid: euclid.aos/1024691352
Zentralblatt MATH: 0929.62069 - [78] Schapire, R. and Singer, Y. (2000). Boostexter: A boosting-based system for text categorization. Machine Learning 39 135–168.
- [79] Southwell, R. (1946). Relaxation Methods in Theoretical Physics. Oxford, at the Clarendon Press.Mathematical Reviews (MathSciNet): MR18983
- [80] Street, W. N., Mangasarian, O. L., and Wolberg, W. H. (1995). An inductive learning approach to prognostic prediction. In Proceedings of the Twelfth International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA.
- [81] Temlyakov, V. (2000). Weak greedy algorithms. Adv. Comput. Math. 12 213–227.
- [82] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
- [83] Tukey, J. (1977). Exploratory Data Analysis. Addison-Wesley, Reading, MA.
- [84] Tutz, G. and Binder, H. (2006). Generalized additive modelling with implicit variable selection by likelihood based boosting. Biometrics 62 961–971.Mathematical Reviews (MathSciNet): MR2297666
Digital Object Identifier: doi:10.1111/j.1541-0420.2006.00578.x
Zentralblatt MATH: 1116.62075 - [85] Tutz, G. and Binder, H. (2007). Boosting Ridge regression. Comput. Statist. Data Anal. 51 6044–6059.Mathematical Reviews (MathSciNet): MR2407697
- [86] Tutz, G. and Hechenbichler, K. (2005). Aggregating classifiers with ordinal response structure. J. Statist. Comput. Simul. 75 391–408.Mathematical Reviews (MathSciNet): MR2136546
Digital Object Identifier: doi:10.1080/00949650410001729481 - [87] Tutz, G. and Leitenstorfer, F. (2007). Generalized smooth monotonic regression in additive modelling. J. Comput. Graph. Statist. 16 165–188.Mathematical Reviews (MathSciNet): MR2345751
Digital Object Identifier: doi:10.1198/106186007X180949 - [88] Tutz, G. and Reithinger, F. (2007). Flexible semiparametric mixed models. Statistics in Medicine 26 2872–2900.
- [89] van der Laan, M. and Robins, J. (2003). Unified Methods for Censored Longitudinal Data and Causality. Springer, New York.
- [90] West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson, J., Marks, J. and Nevins, J. (2001). Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. USA 98 11462–11467.
- [91] Yao, Y., Rosasco, L. and Caponnetto, A. (2007). On early stopping in gradient descent learning. Constr. Approx. 26 289–315.Mathematical Reviews (MathSciNet): MR2327601
Digital Object Identifier: doi:10.1007/s00365-006-0663-2
Zentralblatt MATH: 1125.62035 - [92] Zhang, T. and Yu, B. (2005). Boosting with early stopping: Convergence and consistency. Ann. Statist. 33 1538–1579.Mathematical Reviews (MathSciNet): MR2166555
Digital Object Identifier: doi:10.1214/009053605000000255
Project Euclid: euclid.aos/1123250222
Zentralblatt MATH: 1078.62038 - [93] Zhao, P. and Yu, B. (2007). Stagewise Lasso. J. Mach. Learn. Res. 8 2701–2726.Mathematical Reviews (MathSciNet): MR2383572
- [94] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Machine Learning Research 7 2541–2563.Mathematical Reviews (MathSciNet): MR2274449
- [95] Zhu, J., Rosset, S., Zou, H. and Hastie, T. (2005). Multiclass AdaBoost. Technical report, Stanford Univ. Available at http://www-stat.stanford.edu/~hastie/Papers/samme.pdf.
- [96] Zou, H. (2006). The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.Mathematical Reviews (MathSciNet): MR2279469
Digital Object Identifier: doi:10.1198/016214506000000735
Zentralblatt MATH: 1171.62326

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting
Buja, Andreas, Mease, David, and Wyner, Abraham J., Statistical Science, 2007 - Object-Oriented Programming, Functional Programming and R
Chambers, John M., Statistical Science, 2014 - BART: Bayesian additive regression
trees
Chipman, Hugh A., George, Edward I., and McCulloch, Robert E., The Annals of Applied Statistics, 2010
- Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting
Buja, Andreas, Mease, David, and Wyner, Abraham J., Statistical Science, 2007 - Object-Oriented Programming, Functional Programming and R
Chambers, John M., Statistical Science, 2014 - BART: Bayesian additive regression
trees
Chipman, Hugh A., George, Edward I., and McCulloch, Robert E., The Annals of Applied Statistics, 2010 - Robust boosting with truncated loss functions
Wang, Zhu, Electronic Journal of Statistics, 2018 - Generalized functional additive mixed models
Scheipl, Fabian, Gertheiss, Jan, and Greven, Sonja, Electronic Journal of Statistics, 2016 - BS-SIM: An effective variable selection method for high-dimensional single index model
Cheng, Longjie, Zeng, Peng, and Zhu, Yu, Electronic Journal of Statistics, 2017 - High-dimensional data: p > > n in mathematical statistics and bio-medical applications
Van De Geer, Sara A. and Van Houwelingen, Hans C., Bernoulli, 2004 - Variable selection using MM algorithms
Hunter, David R. and Li, Runze, The Annals of Statistics, 2005 - Random survival forests
Ishwaran, Hemant, Kogalur, Udaya B., Blackstone, Eugene H., and Lauer, Michael S., The Annals of Applied Statistics, 2008 - The composite absolute penalties family for grouped and hierarchical variable selection
Zhao, Peng, Rocha, Guilherme, and Yu, Bin, The Annals of Statistics, 2009