Open Access
September, 1992 Minimum Impurity Partitions
David Burshtein, Vincent Della Pietra, Dimitri Kanevsky, Arthur Nadas
Ann. Statist. 20(3): 1637-1646 (September, 1992). DOI: 10.1214/aos/1176348789

Abstract

Let $(X, U)$ be jointly distributed on $\mathscr{X} \times \mathscr{R}^n$. Let $Y = E(U\mid X)$ and let $\mathscr{U}$ be the convex hull of the range of $U$. Let $C: \mathscr{X} \rightarrow \mathscr{C} = \{1,2,\ldots,k\}, k \geq 1$, induce a measurable $k$ way partition $\{\mathscr{X}_1,\ldots,\mathscr{X}_k\}$ of $\mathscr{X}$. Define the impurity of $\mathscr{X}_c = C^{-1}(c)$ to be $\phi(c, E(U\mid C(X) = c))$, where $\phi: \mathscr{C} \times \mathscr{U} \rightarrow \mathscr{R}^1$ is a concave function in its second argument. Define the impurity $\Psi$ of the partition as the average impurity of its members: $\Psi(C) = E\phi(C(X), E(U\mid C(X))$. We show that for any $C: \mathscr{X} \rightarrow \mathscr{C}$ there exists a mapping $\tilde{C}: \mathscr{U} \rightarrow \mathscr{C}$, such that $\Psi(\tilde{C}(Y)) \leq \Psi(C)$ and such that $\tilde{C}^{-1}(c)$ is convex, that is, for each $i, j \in C, i \neq j$, there exists a separating hyperplane between $\tilde{C}^{-1}(i)$ and $\tilde{C}^{-1}(j)$. This generalizes some results in statistics and information theory. Suitable choices of $U$ and $\phi$ lead to optimal partitions of simple form useful in the construction of classification trees and multidimensional regression trees.

Citation

Download Citation

David Burshtein. Vincent Della Pietra. Dimitri Kanevsky. Arthur Nadas. "Minimum Impurity Partitions." Ann. Statist. 20 (3) 1637 - 1646, September, 1992. https://doi.org/10.1214/aos/1176348789

Information

Published: September, 1992
First available in Project Euclid: 12 April 2007

zbMATH: 0781.62094
MathSciNet: MR1186270
Digital Object Identifier: 10.1214/aos/1176348789

Subjects:
Primary: 62H30
Secondary: 62C05 , 62C10 , 62J02 , 68T05 , 68T10

Keywords: CART , ‎classification‎ , decision theory , decision trees , discrimination , partitioning , regression

Rights: Copyright © 1992 Institute of Mathematical Statistics

Vol.20 • No. 3 • September, 1992
Back to Top