Electronic Journal of Statistics

Robust boosting with truncated loss functions

Zhu Wang

Full-text: Open access


Boosting is a powerful machine learning tool with attractive theoretical properties. In recent years, boosting algorithms have been extended to many statistical estimation problems. For data contaminated with outliers, however, development of boosting algorithms is very limited. In this paper, innovative robust boosting algorithms utilizing the majorization-minimization (MM) principle are developed for binary and multi-category classification problems. Based on truncated loss functions, the robust boosting algorithms share a unified framework for linear and nonlinear effects models. The proposed methods can reduce the heavy influence from a small number of outliers which could otherwise distort the results. In addition, adaptive boosting for the truncated loss functions are developed to construct more sparse predictive models. We present convergence guarantees for smooth surrogate loss functions with both iteration-varying and constant step-sizes. We conducted empirical studies using data from simulations, a pediatric database developed for the US Healthcare Cost and Utilization Project, and breast cancer gene expression data. Compared with non-robust boosting, robust boosting improves classification accuracy and variable selection.

Article information

Electron. J. Statist. Volume 12, Number 1 (2018), 599-650.

Received: August 2016
First available in Project Euclid: 27 February 2018

Permanent link to this document

Digital Object Identifier

Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20] 62G35: Robustness
Secondary: 68Q32: Computational learning theory [See also 68T05] 90C26: Nonconvex programming, global optimization

Robust method machine learning boosting difference of convex MM algorithm

Creative Commons Attribution 4.0 International License.


Wang, Zhu. Robust boosting with truncated loss functions. Electron. J. Statist. 12 (2018), no. 1, 599--650. doi:10.1214/18-EJS1404. https://projecteuclid.org/euclid.ejs/1519700496

Export citation


  • [1] Black, M. J. and Rangarajan, A. (1996). On the unification of line processes, outlier rejection, and robust statistics with applications in early vision., International Journal of Computer Vision, 19(1):57–91.
  • [2] Boyd, S. and Vandenberghe, L. (2004)., Convex Optimization. Cambridge University Press.
  • [3] Bühlmann, P. and Hothorn, T. (2007). Boosting algorithms: Regularization, prediction and model fitting (with discussion)., Statistical Science, 22(4):477–505.
  • [4] Bühlmann, P. and Hothorn, T. (2010). Twin boosting: improved feature selection and prediction., Statistics and Computing, 20:119–138.
  • [5] Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion)., Journal of the Royal Statistical Society, Series B, 39(1):1–38.
  • [6] Freund, Y. (2001). An adaptive version of the boost by majority algorithm., Machine Learning, 43(3):293–318.
  • [7] Freund, Y. (2009). A more robust boosting algorithm., arXiv preprint https://arxiv.org/abs/0905.2138.
  • [8] Freund, Y. and Schapire, R. E. (1995). A decision-theoretic generalization of on-line learning and an application to boosting. In, European Conference on Computational Learning Theory, pages 23–37.
  • [9] Freund, Y. and Schapire, R. E. (1996). Experiments with a new boosting algorithm. In, International Conference on Machine Learning, pages 148–156.
  • [10] Friedman, B., Henke, R. M., and Wier, L. M. (2010). Most expensive hospitalizations, 2008., Agency for Health Care Policy and Research (US). http://www.hcup-us.ahrq.gov/reports/statbriefs/sb97.pdf.
  • [11] Friedman, J. (2001). Greedy function approximation: a gradient boosting machine., Annals of Statistics, 29(5):1189–1232.
  • [12] Friedman, J., Hastie, T., Tibshirani, R., et al. (2000). Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors)., Annals of Statistics, 28(2):337–407.
  • [13] Grubb, A. and Bagnell, J. A. (2011). Generalized boosting algorithms for convex optimization. In, Proceedings of the 28th International Conference on International Conference on Machine Learning, pages 1209–1216. Omnipress.
  • [14] Krause, N. and Singer, Y. (2004). Leveraging the margin more carefully. In, Proceedings of the Twenty-first International Conference on Machine Learning, page 63, Banff, Canada. ACM.
  • [15] Lange, K. (2013)., Optimization. Springer, New York, second edition.
  • [16] Li, A. H. and Bradic, J. (2015). Boosting in the presence of outliers: adaptive classification with non-convex loss functions., http://arxiv.org/pdf/1510.01064v1.pdf.
  • [17] Lin, Y. (2004). A note on margin-based loss functions in classification., Statistics & Probability Letters, 68(1):73–82.
  • [18] Mason, L., Baxter, J., Bartlett, P., and Frean, M. (2000). Functional gradient techniques for combining hypotheses. In Smola, A., Bartlett, P., Schölkopf, B., and Schuurmans, D., editors, Advances in Large Margin Classifiers, pages 221–246, Cambridge, MA. MIT Press.
  • [19] Mayr, A., Binder, H., Gefeller, O., and Schmid, M. (2014). The evolution of boosting algorithms: From machine learning to statistical modelling (together with the companion review and an invited discussion)., Methods of Information in Medicine, 53(6):419–427.
  • [20] McDonald, R. A., Hand, D. J., and Eckley, I. A. (2004). A Multiclass Extension to the Brownboost Algorithm., International Journal of Pattern Recognition and Artificial Intelligence, 18(05):905–931.
  • [21] McLachlan, G. and Krishnan, T. (2007)., The EM Algorithm and Extensions, volume 382. John Wiley & Sons.
  • [22] Moturu, S. T., Johnson, W. G., and Liu, H. (2007). Predicting future high-cost patients: a real-world risk modeling application. In, IEEE International Conference on Bioinformatics and Biomedicine, pages 202–208. IEEE.
  • [23] Nesterov, Y. (2004)., Introductory Lectures on Convex Optimization: A Basic Course. Springer Science & Business Media.
  • [24] Park, S. Y. and Liu, Y. (2011). Robust penalized logistic regression with truncated loss functions., Canadian Journal of Statistics, 39(2):300–323.
  • [25] Schwarz, G. (1978). Estimating the dimension of a model., Annals of Statistics, 6(2):461–464.
  • [26] Searle, S. R. (1982)., Matrix Algebra Useful for Statistics (Wiley Series in Probability and Statistics). Wiley-Interscience.
  • [27] Shi, L., Campbell, G., Jones, W., Campagne, F., et al. (2010). The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models., Nature Biotechnology, 28(8):827–838. https://goo.gl/8bdBDE.
  • [28] Sutherland, S. M., Ji, J., Sheikhi, F. H., Widen, E., Tian, L., Alexander, S. R., and Ling, X. B. (2013). AKI in hospitalized children: epidemiology and clinical associations in a national cohort., Clinical Journal of the American Society of Nephrology, 8(10):1661–1669.
  • [29] Tao, P. D. and An, L. T. H. (1997). Convex analysis approach to dc programming: Theory, algorithms and applications., Acta Mathematica Vietnamica, 22(1):289–355.
  • [30] Vapnik, V. (1996)., The Nature of Statistical Learning Theory. Springer-Verlag, New York.
  • [31] Wang, Z. (2011). HingeBoost: ROC-based boost for classification and variable selection., The International Journal of Biostatistics, 7(1):1–30.
  • [32] Wang, Z. (2012). Multi-class HingeBoost: Method and application to the classification of cancer types using gene expression data., Methods of Information in Medicine, 51(2):162–167.
  • [33] Wassermann, L. (2006)., All of Nonparametric Statistics. Springer Science & Business Media, New York.
  • [34] Wu, Y. and Liu, Y. (2007a). On multicategory truncated-hinge-loss support vector. In, Prediction and Discovery: AMS-IMS-SIAM Joint Summer Research Conference, Machine and Statistical Learning: Prediction and Discovery, June 25–29, 2006, Snowbird, Utah, volume 443, page 49. American Mathematical Soc.
  • [35] Wu, Y. and Liu, Y. (2007b). Robust truncated hinge loss support vector machines., Journal of the American Statistical Association, 102(479):974–983.
  • [36] Yang, M., Xu, L., White, M., Schuurmans, D., and Yu, Y.-l. (2010). Relaxed clipping: A global training method for robust regression and classification. In, Advances in Neural Information Processing Systems, pages 2532–2540.
  • [37] Zhang, X., Wu, Y., Wang, L., and Li, R. (2016). Variable selection for support vector machines in moderately high dimensions., Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(1):53–76.