### Optimal Predictive Linear Discriminants

Peter Enis and Seymour Geisser
Source: Ann. Statist. Volume 2, Number 2 (1974), 403-410.

#### Abstract

When classifying an observation $\mathbf{z}$ which has arisen (with known prior probabilities) from one of two $p$-variate nonsingular normal populations with known parameters, the discriminant, say $U$, which minimizes the total probability of misclassification is based on the logarithm of the ratio of the densities of the two populations. When the parameters are unknown, the "classical" procedure has been to substitute sample estimates for the unknown parameters in $U$ and use the resulting sample discriminant, say $V$, as the basis for classifying future observations. This procedure need not enjoy the property of minimizing the probability of misclassification and has been justified, from the classical point of view, almost entirely on the grounds that it seems intuitively reasonable. When the covariance matrices of the two normal populations are equal, $U$ is a linear function of the observation vector $\mathbf{z}$. The fact that $U$ minimizes the probability of misclassification does not imply that $V$ will. Further, although $U$ is linear, the sample discriminant which minimizes the probability of misclassification will, in general, not be linear. Here, using the Bayesian notion of a predictive distribution, we obtain from amongst the class of linear sample discriminants that one which minimizes the predictive probability of misclassification.

First Page:
Primary Subjects: 62H30
Secondary Subjects: 62F15
Full-text: Open access