The Annals of Applied Probability
- Ann. Appl. Probab.
- Volume 13, Number 1 (2003), 213-252.
Bounding the generalization error of convex combinations of classifiers: balancing the dimensionality and the margins
Vladimir Koltchinskii, Dmitriy Panchenko, and Fernando Lozano
Abstract
A problem of bounding the generalization error of a classifier %\break $f\in \conv(\mathcal{H})$, where $\mathcal{H}$ is a "base" class of functions (classifiers), is considered. This problem frequently occurs in computer learning, where efficient algorithms that combine simple classifiers into a complex one (such as boosting and bagging) have attracted a lot of attention. Using Talagrand's concentration inequalities for empirical processes, we obtain new sharper bounds on the generalization error of combined classifiers that take into account both the empirical distribution of "classification margins" and an "approximate dimension" of the classifiers, and study the performance of these bounds in several experiments with learning algorithms.
Article information
Source
Ann. Appl. Probab. Volume 13, Number 1 (2003), 213-252.
Dates
First available in Project Euclid: 16 January 2003
Permanent link to this document
http://projecteuclid.org/euclid.aoap/1042765667
Digital Object Identifier
doi:10.1214/aoap/1042765667
Mathematical Reviews number (MathSciNet)
MR1951998
Zentralblatt MATH identifier
1073.62535
Subjects
Primary: 62G05: Estimation
Secondary: 62G20: Asymptotic properties 60F15: Strong theorems
Keywords
Generalization error combined classifier margin approximate dimension empirical process Rademacher process random entropies concentration inequalities boosting bagging
Citation
Koltchinskii, Vladimir; Panchenko, Dmitriy; Lozano, Fernando. Bounding the generalization error of convex combinations of classifiers: balancing the dimensionality and the margins. Ann. Appl. Probab. 13 (2003), no. 1, 213--252. doi:10.1214/aoap/1042765667. http://projecteuclid.org/euclid.aoap/1042765667.

