An Application of Information Theory to Multivariate Analysis

S. Kullback

doi:10.1214/aoms/1177729487

March, 1952 An Application of Information Theory to Multivariate Analysis

S. Kullback

Ann. Math. Statist. 23(1): 88-102 (March, 1952). DOI: 10.1214/aoms/1177729487

Abstract

The problem considered is that of finding the "best" linear function for discriminating between two multivariate normal populations, $\pi_1$ and $\pi_2$, without limitation to the case of equal covariance matrices. The "best" linear function is found by maximizing the divergence, $J'(1, 2)$, between the distributions of the linear function. Comparison with the divergence, $J(1, 2)$, between $\pi_1$ and $\pi_2$ offers a measure of the discriminating efficiency of the linear function, since $J(1, 2) \geq J'(1, 2)$. The divergence, a special case of which is Mahalanobis's Generalized Distance, is defined in terms of a measure of information which is essentially that of Shannon and Wiener. Appropriate assumptions about $\pi_1$ and $\pi_2$ lead to discriminant analysis (Sections 4, 7), principal components (Section 5), and canonical correlations (Section 6).

Citation

Download Citation

S. Kullback. "An Application of Information Theory to Multivariate Analysis." Ann. Math. Statist. 23 (1) 88 - 102, March, 1952. https://doi.org/10.1214/aoms/1177729487