Minimum Impurity Partitions

David Burshtein; Vincent Della Pietra; Dimitri Kanevsky; Arthur Nadas

doi:10.1214/aos/1176348789

September, 1992 Minimum Impurity Partitions

David Burshtein, Vincent Della Pietra, Dimitri Kanevsky, Arthur Nadas

Ann. Statist. 20(3): 1637-1646 (September, 1992). DOI: 10.1214/aos/1176348789

Abstract

Let $(X, U)$ be jointly distributed on $\mathscr{X} \times \mathscr{R}^n$. Let $Y = E(U\mid X)$ and let $\mathscr{U}$ be the convex hull of the range of $U$. Let $C: \mathscr{X} \rightarrow \mathscr{C} = \{1,2,\ldots,k\}, k \geq 1$, induce a measurable $k$ way partition $\{\mathscr{X}_1,\ldots,\mathscr{X}_k\}$ of $\mathscr{X}$. Define the impurity of $\mathscr{X}_c = C^{-1}(c)$ to be $\phi(c, E(U\mid C(X) = c))$, where $\phi: \mathscr{C} \times \mathscr{U} \rightarrow \mathscr{R}^1$ is a concave function in its second argument. Define the impurity $\Psi$ of the partition as the average impurity of its members: $\Psi(C) = E\phi(C(X), E(U\mid C(X))$. We show that for any $C: \mathscr{X} \rightarrow \mathscr{C}$ there exists a mapping $\tilde{C}: \mathscr{U} \rightarrow \mathscr{C}$, such that $\Psi(\tilde{C}(Y)) \leq \Psi(C)$ and such that $\tilde{C}^{-1}(c)$ is convex, that is, for each $i, j \in C, i \neq j$, there exists a separating hyperplane between $\tilde{C}^{-1}(i)$ and $\tilde{C}^{-1}(j)$. This generalizes some results in statistics and information theory. Suitable choices of $U$ and $\phi$ lead to optimal partitions of simple form useful in the construction of classification trees and multidimensional regression trees.

Citation

Download Citation

David Burshtein. Vincent Della Pietra. Dimitri Kanevsky. Arthur Nadas. "Minimum Impurity Partitions." Ann. Statist. 20 (3) 1637 - 1646, September, 1992. https://doi.org/10.1214/aos/1176348789