Open Access
October 2008 Choice of neighbor order in nearest-neighbor classification
Peter Hall, Byeong U. Park, Richard J. Samworth
Ann. Statist. 36(5): 2135-2152 (October 2008). DOI: 10.1214/07-AOS537

Abstract

The kth-nearest neighbor rule is arguably the simplest and most intuitively appealing nonparametric classification procedure. However, application of this method is inhibited by lack of knowledge about its properties, in particular, about the manner in which it is influenced by the value of k; and by the absence of techniques for empirical choice of k. In the present paper we detail the way in which the value of k determines the misclassification error. We consider two models, Poisson and Binomial, for the training samples. Under the first model, data are recorded in a Poisson stream and are “assigned” to one or other of the two populations in accordance with the prior probabilities. In particular, the total number of data in both training samples is a Poisson-distributed random variable. Under the Binomial model, however, the total number of data in the training samples is fixed, although again each data value is assigned in a random way. Although the values of risk and regret associated with the Poisson and Binomial models are different, they are asymptotically equivalent to first order, and also to the risks associated with kernel-based classifiers that are tailored to the case of two derivatives. These properties motivate new methods for choosing the value of k.

Citation

Download Citation

Peter Hall. Byeong U. Park. Richard J. Samworth. "Choice of neighbor order in nearest-neighbor classification." Ann. Statist. 36 (5) 2135 - 2152, October 2008. https://doi.org/10.1214/07-AOS537

Information

Published: October 2008
First available in Project Euclid: 13 October 2008

zbMATH: 1274.62421
MathSciNet: MR2458182
Digital Object Identifier: 10.1214/07-AOS537

Subjects:
Primary: 62H30
Secondary: 62G20

Keywords: Bayes classifier , bootstrap resampling , Edgeworth expansion , error probability , misclassification error , Nonparametric classification , Poisson distribution

Rights: Copyright © 2008 Institute of Mathematical Statistics

Vol.36 • No. 5 • October 2008
Back to Top