Electronic Journal of Statistics
- Electron. J. Statist.
- Volume 12, Number 1 (2018), 599-650.
Robust boosting with truncated loss functions
Boosting is a powerful machine learning tool with attractive theoretical properties. In recent years, boosting algorithms have been extended to many statistical estimation problems. For data contaminated with outliers, however, development of boosting algorithms is very limited. In this paper, innovative robust boosting algorithms utilizing the majorization-minimization (MM) principle are developed for binary and multi-category classification problems. Based on truncated loss functions, the robust boosting algorithms share a unified framework for linear and nonlinear effects models. The proposed methods can reduce the heavy influence from a small number of outliers which could otherwise distort the results. In addition, adaptive boosting for the truncated loss functions are developed to construct more sparse predictive models. We present convergence guarantees for smooth surrogate loss functions with both iteration-varying and constant step-sizes. We conducted empirical studies using data from simulations, a pediatric database developed for the US Healthcare Cost and Utilization Project, and breast cancer gene expression data. Compared with non-robust boosting, robust boosting improves classification accuracy and variable selection.
Electron. J. Statist. Volume 12, Number 1 (2018), 599-650.
Received: August 2016
First available in Project Euclid: 27 February 2018
Permanent link to this document
Digital Object Identifier
Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20] 62G35: Robustness
Secondary: 68Q32: Computational learning theory [See also 68T05] 90C26: Nonconvex programming, global optimization
Wang, Zhu. Robust boosting with truncated loss functions. Electron. J. Statist. 12 (2018), no. 1, 599--650. doi:10.1214/18-EJS1404. https://projecteuclid.org/euclid.ejs/1519700496