The Annals of Statistics

Bandwidth choice for nonparametric classification

Peter Hall and Kee-Hoon Kang
Source: Ann. Statist. Volume 33, Number 1 (2005), 284-306.

Abstract

It is shown that, for kernel-based classification with univariate distributions and two populations, optimal bandwidth choice has a dichotomous character. If the two densities cross at just one point, where their curvatures have the same signs, then minimum Bayes risk is achieved using bandwidths which are an order of magnitude larger than those which minimize pointwise estimation error. On the other hand, if the curvature signs are different, or if there are multiple crossing points, then bandwidths of conventional size are generally appropriate. The range of different modes of behavior is narrower in multivariate settings. There, the optimal size of bandwidth is generally the same as that which is appropriate for pointwise density estimation. These properties motivate empirical rules for bandwidth choice.

First Page: Show Hide
Primary Subjects: 62H30, 62C12
Secondary Subjects: 62G07
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1112967707
Digital Object Identifier: doi:10.1214/009053604000000959
Zentralblatt MATH identifier: 02182564
Mathematical Reviews number (MathSciNet): MR2157804

References

Ancukiewicz, M. (1998). An unsupervised and nonparametric classification procedure based on mixtures with known weights. J. Classification 15 129--141.
Mathematical Reviews (MathSciNet): MR1626517
Digital Object Identifier: doi:10.1007/s003579900023
Zentralblatt MATH: 0899.62078
Baek, S. and Sung, K. M. (2000). Fast $K$-nearest-neighbour search algorithm for nonparametric classification. Electronics Letters 36 1821--1822.
Breiman, L. (1998). Arcing classifiers (with discussion). Ann. Statist. 26 801--849.
Mathematical Reviews (MathSciNet): MR1635406
Digital Object Identifier: doi:10.1214/aos/1024691079
Project Euclid: euclid.aos/1024691079
Zentralblatt MATH: 0934.62064
Breiman, L. (2001). Random forests. Machine Learning 45 5--32.
Chanda, K. C. and Ruymgaart, F. H. (1989). Asymptotic estimate of probability of misclassification for discriminant rules based on density estimates. Statist. Probab. Lett. 8 81--88.
Mathematical Reviews (MathSciNet): MR1006427
Cover, T. M. (1968). Rates of convergence for nearest neighbor procedures. In Proc. Hawaii International Conference on System Sciences (B. K. Kinariwala and F. F. Kuo, eds.) 413--415. Univ. Hawaii Press, Honolulu.
Devroye, L. (1982). Any discrimination rule can have an arbitrarily bad probability of error for finite sample size. IEEE Trans. Pattern Anal. Machine Intelligence 4 154--157.
Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
Mathematical Reviews (MathSciNet): MR1383093
Zentralblatt MATH: 0853.68150
Dudoit, S., Fridlyand, J. and Speed, T. P. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. J. Amer. Statist. Assoc. 97 77--87.
Mathematical Reviews (MathSciNet): MR1963389
Digital Object Identifier: doi:10.1198/016214502753479248
Zentralblatt MATH: 1073.62576
Efron, B. (1983). Estimating the error rate of a prediction rule: Improvement on cross-validation. J. Amer. Statist. Assoc. 78 316--331.
Mathematical Reviews (MathSciNet): MR711106
Efron, B. and Tibshirani, R. (1997). Improvements on cross-validation: The $.632+$ bootstrap method. J. Amer. Statist. Assoc. 92 548--560.
Mathematical Reviews (MathSciNet): MR1467848
Ehrenfeucht, A., Haussler, D., Kearns, M. and Valiant, L. (1989). A general lower bound on the number of examples needed for learning. Inform. and Comput. 82 247--261. Also published in Proc. 1988 Workshop on Computational Learning Theory (D. Haussler and L. Pitt, eds.) 139--154. Morgan Kaufmann, San Mateo, CA.
Mathematical Reviews (MathSciNet): MR1016683
Digital Object Identifier: doi:10.1016/0890-5401(89)90002-3
Zentralblatt MATH: 0679.68158
Fix, E. and Hodges, J. (1951). Discriminatory analysis. Nonparametric discrimination: Consistency properties. Technical Report No. 4, Project No. 21-49-004, USAF School of Aviation Medicine, Randolph Field, TX.
Friedman, J. H. (1997). On bias, variance, $0/1$-loss, and the curse-of-dimensionality. Data Min. Knowl. Discov. 1 55--77.
Friedman, J. H., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337--407.
Mathematical Reviews (MathSciNet): MR1790002
Digital Object Identifier: doi:10.1214/aos/1016218223
Project Euclid: euclid.aos/1016218223
Zentralblatt MATH: 1106.62323
Fukunaga, K. and Flick, T. E. (1984). Classification error for a very large number of classes. IEEE Trans. Pattern Anal. Machine Intelligence 6 779--788.
Fukunaga, K. and Hummels, D. M. (1987). Bias of nearest neighbor estimates. IEEE Trans. Pattern Anal. Machine Intelligence 9 103--112.
Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer, New York.
Mathematical Reviews (MathSciNet): MR1920390
Hall, P. (1983). Large sample optimality of least squares cross-validation in density estimation. Ann. Statist. 11 1156--1174.
Mathematical Reviews (MathSciNet): MR720261
Hall, P. and Kang, K.-H. (2002). Effect of bandwidth choice on Bayes risk in nonparametric classification. Available at http://stats.hufs.ac.kr/$\sim$khkang.
Hall, P. and Schucany, W. R. (1989). A local cross-validation algorithm. Statist. Probab. Lett. 8 109--117.
Mathematical Reviews (MathSciNet): MR1017876
Härdle, W. and Kelly, G. (1987). Nonparametric kernel regression estimation---optimal choice of bandwidth. Statistics 18 21--35.
Mathematical Reviews (MathSciNet): MR871448
Holmström, L. and Klemelä, J. (1992). Asymptotic bounds for the expected $L^1$ error of a multivariate kernel density estimator. J. Multivariate Anal. 42 245--266.
Mathematical Reviews (MathSciNet): MR1183845
Digital Object Identifier: doi:10.1016/0047-259X(92)90046-I
Zentralblatt MATH: 0754.62021
Jiang, W. X. (2002). On weak base hypotheses and their implications for boosting regression and classification Ann. Statist. 30 51--73.
Mathematical Reviews (MathSciNet): MR1892655
Digital Object Identifier: doi:10.1214/aos/1015362184
Project Euclid: euclid.aos/1015362184
Zentralblatt MATH: 1012.62066
Kharin, Yu. S. (1983). Analysis and optimization of Rosenblatt--Parzen classifier with the aid of asymptotic expansions. Automat. Remote Control 44 72--80.
Mathematical Reviews (MathSciNet): MR714594
Kharin, Yu. S. and Ducinskas, K. (1979). The asymptotic expansion of the risk for classifiers using maximum likelihood estimates. Statist. Problemy Upravleniya---Trudy Sem. Protsessy Optimal. Upravleniya V Sektsiya 38 77--93. (In Russian.)
Mathematical Reviews (MathSciNet): MR565564
Kim, H. and Loh, W.-Y. (2001). Classification trees with unbiased multiway splits. J. Amer. Statist. Assoc. 96 589--604.
Mathematical Reviews (MathSciNet): MR1946427
Digital Object Identifier: doi:10.1198/016214501753168271
Krzyżak, A. (1991). On exponential bounds on the Bayes risk of the nonparametric classification rules. In Nonparametric Functional Estimation and Related Topics (G. Roussas, ed.) 347--360. Kluwer, Dordrecht.
Mathematical Reviews (MathSciNet): MR1154340
Lapko, A. V. (1993). Nonparametric Classification Methods and Their Application. VO Nauka, Novosibirsk. (In Russian.)
Mathematical Reviews (MathSciNet): MR1248376
Zentralblatt MATH: 0883.62062
Lin, C.-T. (2001). Nonparametric classification on two univariate distributions. Comm. Statist. Theory Methods 30 319--330.
Mathematical Reviews (MathSciNet): MR1862703
Digital Object Identifier: doi:10.1081/STA-100002034
Zentralblatt MATH: 1009.62550
Lugosi, G. and Nobel, A. (1996). Consistency of data-driven histogram methods for density estimation and classification. Ann. Statist. 24 687--706.
Mathematical Reviews (MathSciNet): MR1394983
Digital Object Identifier: doi:10.1214/aos/1032894460
Project Euclid: euclid.aos/1032894460
Zentralblatt MATH: 0859.62040
Lugosi, G. and Pawlak, M. (1994). On the posterior-probability estimate of the error rate of nonparametric classification rules. IEEE Trans. Inform. Theory 40 475--481.
Mathematical Reviews (MathSciNet): MR1294051
Digital Object Identifier: doi:10.1109/18.312167
Zentralblatt MATH: 0802.62062
Mammen, E. and Tsybakov, A. B. (1999). Smooth discrimination analysis. Ann. Statist. 27 1808--1829.
Mathematical Reviews (MathSciNet): MR1765618
Digital Object Identifier: doi:10.1214/aos/1017939240
Project Euclid: euclid.aos/1017939240
Zentralblatt MATH: 0961.62058
Marron, J. S. (1983). Optimal rates on convergence to Bayes risk in nonparametric discrimination. Ann. Statist. 11 1142--1155.
Mathematical Reviews (MathSciNet): MR720260
Mielniczuk, J., Sarda, P. and Vieu, P. (1989). Local data-driven bandwidth choice for density estimation. J. Statist. Plann. Inference 23 53--69.
Mathematical Reviews (MathSciNet): MR1029240
Digital Object Identifier: doi:10.1016/0378-3758(89)90039-6
Zentralblatt MATH: 0689.62027
Pawlak, M. (1993). Kernel classification rules from missing data. IEEE Trans. Inform. Theory 39 979--988.
Psaltis, D., Snapp, R. R. and Venkatesh, S. S. (1994). On the finite sample performance of the nearest neighbor classifier. IEEE Trans. Inform. Theory 40 820--837.
Schapire, R. E., Freund, Y., Bartlett, P. and Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651--1686.
Mathematical Reviews (MathSciNet): MR1673273
Digital Object Identifier: doi:10.1214/aos/1024691352
Project Euclid: euclid.aos/1024691352
Zentralblatt MATH: 0929.62069
Steele, B. M. and Patterson, D. A. (2000). Ideal bootstrap estimation of expected prediction error for $k$-nearest neighbor classifiers: Applications for classification and error assessment. Statist. Comput. 10 349--355.
Stoller, D. S. (1954). Univariate two-population distribution-free discrimination. J. Amer. Statist. Assoc. 49 770--777.
Mathematical Reviews (MathSciNet): MR66608
Stone, C. J. (1984). An asymptotically optimal window selection rule for kernel density estimates. Ann. Statist. 12 1285--1297.
Mathematical Reviews (MathSciNet): MR760688
Yang, Y. H. (1999a). Minimax nonparametric classification. I. Rates of convergence. IEEE Trans. Inform. Theory 45 2271--2284.
Mathematical Reviews (MathSciNet): MR1725115
Digital Object Identifier: doi:10.1109/18.796368
Zentralblatt MATH: 0962.62026
Yang, Y. H. (1999b). Minimax nonparametric classification. II. Model selection for adaptation. IEEE Trans. Inform. Theory 45 2285--2292.
Mathematical Reviews (MathSciNet): MR1725116
Digital Object Identifier: doi:10.1109/18.796369
Zentralblatt MATH: 0962.62027

2012 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics