Open Access
2008 P-values for classification
Lutz Dümbgen, Bernd-Wolfgang Igl, Axel Munk
Electron. J. Statist. 2: 468-493 (2008). DOI: 10.1214/08-EJS245

Abstract

Let (X,Y) be a random variable consisting of an observed feature vector $X\in \mathcal{X}$ and an unobserved class label Y{1,2,,L} with unknown joint distribution. In addition, let $\mathcal{D}$ be a training data set consisting of n completely observed independent copies of (X,Y). Usual classification procedures provide point predictors (classifiers) $\widehat{Y}(X,\mathcal{D})$ of Y or estimate the conditional distribution of Y given X. In order to quantify the certainty of classifying X we propose to construct for each θ=1,2,,L a p-value $\pi_{\theta}(X,\mathcal{D})$ for the null hypothesis that Y=θ, treating Y temporarily as a fixed parameter. In other words, the point predictor $\widehat{Y}(X,\mathcal{D})$ is replaced with a prediction region for Y with a certain confidence. We argue that (i) this approach is advantageous over traditional approaches and (ii) any reasonable classifier can be modified to yield nonparametric p-values. We discuss issues such as optimality, single use and multiple use validity, as well as computational and graphical aspects.

Citation

Download Citation

Lutz Dümbgen. Bernd-Wolfgang Igl. Axel Munk. "P-values for classification." Electron. J. Statist. 2 468 - 493, 2008. https://doi.org/10.1214/08-EJS245

Information

Published: 2008
First available in Project Euclid: 26 June 2008

zbMATH: 1138.60313
MathSciNet: MR2417390
Digital Object Identifier: 10.1214/08-EJS245

Subjects:
Primary: 62C05 , 62F25 , 62G09 , 62G15 , 62H30

Keywords: nearest neighbors , nonparametric , optimality , Permutation test , prediction region , ROC curve , typicality index , validity

Rights: Copyright © 2008 The Institute of Mathematical Statistics and the Bernoulli Society

Back to Top