The Annals of Statistics

Asymptotic expansions of the $k$ nearest neighbor risk

Robert R. Snapp and Santosh S. Venkatesh
Source: Ann. Statist. Volume 26, Number 3 (1998), 850-878.

Abstract

The finite-sample risk of the $k$ nearest neighbor classifier that uses a weighted $L^p$-metric as a measure of class similarity is examined. For a family of classification problems with smooth distributions in $mathbb{R}^n$, an asymptotic expansion for the risk is obtained in decreasing fractional powers of the reference sample size. An analysis of the leading expansion coefficients reveals that the optimal weighted $L^p$-metric, that is, the metric that minimizes the finite-sample risk, tends to a weighted Euclidean (i.e., $L^2$) metric as the sample size is increased. Numerical simulations corroborate this finding for a pattern recognition problem with normal class-conditional densities.

First Page: Show Hide
Primary Subjects: 62G20, 62H30, 41A60
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1024691080
Mathematical Reviews number (MathSciNet): MR1635410
Digital Object Identifier: doi:10.1214/aos/1024691080
Zentralblatt MATH identifier: 0929.62070

References

1 BREIMAN, L., FRIEDMAN, J. H., OLSHEN, R. A. and STONE, C. J. 1984. Classification and Regression Trees. Wadsworth & Brooks Cole, Pacific Grove, CA.
Mathematical Reviews (MathSciNet): MR86b:62101
Zentralblatt MATH: 0541.62042
2 COVER, T. M. 1968. Rates of convergence of nearest neighbor decision procedures. In Proceedings First Annual Hawaii Conference on Sy stems Theory 413 415.
3 COVER, T. M. and HART, P. E. 1967. Nearest neighbor pattern classification. IEEE Trans. Inform. Theory 13 21 27.
Zentralblatt MATH: 0154.44505
4 DEVROy E, L. 1982. Any discrimination rule can have an arbitrarily bad probability of error for finite sample size. IEEE Trans. Pattern Anal. Machine Intelligence 4 154 157.
5 ERDELy I, A. 1956. Asy mptotic Expansions. Dover, New York. ´
6 FIX, E. and HODGES, J. L., JR. 1951. Discriminatory analysis nonparametric discrimination: consistency properties. Project 21-49-004, Report No. 4. 261 279. USAF School of Aviation Medicine, Randolf Field, TX.
7 FRIEDMAN, J. H., BENTLEY, J. L. and FINKEL, R. A. 1977. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Software 3 209 226.
Zentralblatt MATH: 0364.68037
8 FUKUNAGA, K. and HUMMELS, D. M. 1987. Bias of nearest neighbor estimates. IEEE Trans. Pattern Anal. Machine Intelligence 9 103 112.
Zentralblatt MATH: 0623.62055
9 FUKUNAGA, K. and FLICK, T. E. 1984. An optimal global nearest neighbor metric. IEEE Trans. Pattern Anal. Machine Intelligence 6 314 318.
Zentralblatt MATH: 0534.62041
10 FULKS, W. and SATHER, J. O. 1961. Asy mptotics II: Laplace's method for multiple integrals. Pacific J. Math. 11 185 192.
Mathematical Reviews (MathSciNet): MR138945
Zentralblatt MATH: 0143.34804
Project Euclid: euclid.pjm/1103037543
11 HELLMAN, M. E. 1970. The nearest-neighbor classification rule with a reject option. IEEE Trans. Sy stems Man Cy bernet. 6 179 185.
12 KNUTH, D. E. 1976. Big omicron and big omega and big theta. ACM SIGACT News 8 18 23.
13 PSALTIS, D., SNAPP, R. R. and VENKATESH, S. S. 1994. On the finite sample performance of the nearest neighbor classifier. IEEE Trans. Inform. Theory 40 820 837.
Zentralblatt MATH: 0820.62060
14 SMITH, S. J., BOURGOIN, M. O., SIMS, K. and VOORHEES, H. L. 1994. Handwritten character classification using nearest neighbor in large databases. IEEE Trans. Pattern Anal. Machine Intelligence 16 915 919, 1994.
15 SNAPP, R. R. and VENKATESH, S. S. 1994. Asy mptotic predictions of the finite-sample risk of the k-nearest-neighbor classifier. In Proceedings of the 12th International Conference on Pattern Recognition 2 1 7. IEEE Computer Society Press, Los Alamitos, CA.
16 SNAPP, R. R. and VENKATESH, S. S. 1998. Asy mptotic derivation of the finite-sample risk of the k nearest neighbor classifier. Technical Report UVM-CS-1998-0101, Dept. Computer Science, Univ. Vermont.
17 SNAPP, R. R. and XU, T. 1996. Estimating the Bay es risk from sample data. In Advances Z in Neural Information Processing Sy stems 8 D. S. Touretzky, M. C. Moser, and M. E.. Hasselmo, eds. MIT Press.
18 STONE, C. J. 1977. Consistent nonparametric regression. Ann. Statist. 5 595 645.
Mathematical Reviews (MathSciNet): MR56:1574
Zentralblatt MATH: 0366.62051
Digital Object Identifier: doi:10.1214/aos/1176343886
Project Euclid: euclid.aos/1176343886
19 WATSON, G. N. 1918. The harmonic functions associated with the parabolic cy linder. Proc. London Math. Soc. 17 116 148.
BURLINGTON, VERMONT 05405 PHILADELPHIA, PENNSy LVANIA 19104 E-MAIL: snapp@cs.uvm.edu E-MAIL: venkatesh@ee.upenn.edu

2013 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics

Turn MathJax Off
What is MathJax?