Electronic Journal of Statistics

P-values for classification

Lutz Dümbgen, Bernd-Wolfgang Igl, and Axel Munk

Full-text: Open access


Let (X,Y) be a random variable consisting of an observed feature vector $X\in \mathcal{X}$ and an unobserved class label Y{1,2,,L} with unknown joint distribution. In addition, let $\mathcal{D}$ be a training data set consisting of n completely observed independent copies of (X,Y). Usual classification procedures provide point predictors (classifiers) $\widehat{Y}(X,\mathcal{D})$ of Y or estimate the conditional distribution of Y given X. In order to quantify the certainty of classifying X we propose to construct for each θ=1,2,,L a p-value $\pi_{\theta}(X,\mathcal{D})$ for the null hypothesis that Y=θ, treating Y temporarily as a fixed parameter. In other words, the point predictor $\widehat{Y}(X,\mathcal{D})$ is replaced with a prediction region for Y with a certain confidence. We argue that (i) this approach is advantageous over traditional approaches and (ii) any reasonable classifier can be modified to yield nonparametric p-values. We discuss issues such as optimality, single use and multiple use validity, as well as computational and graphical aspects.

Article information

Electron. J. Statist. Volume 2 (2008), 468-493.

First available in Project Euclid: 26 June 2008

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62C05: General considerations 62F25: Tolerance and confidence regions 62G09: Resampling methods 62G15: Tolerance and confidence regions 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20]

nearest neighbors nonparametric optimality permutation test prediction region ROC curve typicality index validity


Dümbgen, Lutz; Igl, Bernd-Wolfgang; Munk, Axel. P-values for classification. Electron. J. Statist. 2 (2008), 468--493. doi:10.1214/08-EJS245. https://projecteuclid.org/euclid.ejs/1214491852

Export citation


  • [1] Ehm, W., E. Mammen and D.W. Müller (1995). Power robustification of approximately linear tests., J. Amer. Statist. Assoc. 90, 1025–1033.
  • [2] Federer, H. (1969)., Geometric Measure Theory. Springer, Berlin Heidelberg.
  • [3] Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems., Ann. Eugenics 7, 179–184.
  • [4] Fraley, C. and A.E. Raftery (2002). Model-based clustering, discriminant analysis and density estimation., J. Amer. Statist. Assoc. 97, 611–631.
  • [5] Holzmann, H., A. Munk and B. Stratmann (2004). Identifiability of finite mixtures - with applications to circular distributions., Sankhya 66, 440–450.
  • [6] Holzmann, H., A. Munk and T. Gneiting (2006). Identifiability of finite mixtures of elliptical distributions., Scand. J. Statist. 33, 753-763.
  • [7] McLachlan, G.J. (1992)., Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York.
  • [8] Peel, D. and G.J. McLachlan (2000). Robust mixture modeling using the, t-distribution. Statist. Computing 10, 339–348.
  • [9] Peel, D., W.J. Whitten and G.J. McLachlan (2001). Fitting mixtures of Kent distributions to aid in joint set identification., J. Amer. Statist. Assoc. 96, 56–63.
  • [10] Ripley, B.D. (1996)., Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, UK.
  • [11] Shorack, G.R. and J.A. Wellner (1986)., Empirical Processes with Applications to Statistics. Wiley, New York.
  • [12] Stone, C.J. (1977). Consistent nonparametric regression., Ann. Statist. 5, 595–645.
  • [13] Yakowitz, S.J. and J.D. Spragins (1968). On the identifiability of finite mixtures., Ann. Math. Statist. 39, 209–214.