## Electronic Journal of Statistics

### PPtree: Projection pursuit classification tree

#### Abstract

In this paper, we propose a new classification tree, the projection pursuit classification tree (PPtree). It combines tree structured methods with projection pursuit dimension reduction. This tree is originated from the projection pursuit method for classification. In each node, one of the projection pursuit indices using class information - LDA, $L_{r}$ or PDA indices - is maximized to find the projection with the most separated group view. On this optimized data projection, the tree splitting criteria are applied to separate the groups. These steps are iterated until the last two classes are separated. The main advantages of this tree is that it effectively uses correlation between variables to find separations, and it has visual representation of the differences between groups in a 1-dimensional space that can be used to interpret results. Also in each node of the tree, the projection coefficients represent the variable importance for the group separation. This information is very helpful to select variables in classification problems.

#### Article information

Source
Electron. J. Statist., Volume 7 (2013), 1369-1386.

Dates
First available in Project Euclid: 10 May 2013

https://projecteuclid.org/euclid.ejs/1368193535

Digital Object Identifier
doi:10.1214/13-EJS810

Mathematical Reviews number (MathSciNet)
MR3063611

Zentralblatt MATH identifier
1336.62185

#### Citation

Lee, Yoon Dong; Cook, Dianne; Park, Ji-won; Lee, Eun-Kyung. PPtree: Projection pursuit classification tree. Electron. J. Statist. 7 (2013), 1369--1386. doi:10.1214/13-EJS810. https://projecteuclid.org/euclid.ejs/1368193535

#### References

• [1] Kruskal, J. B. (1969). Toward a practical method whic helps uncover the structure of a set of multivariate observations by finding the linear transformatoin which optimizes a new index of condensation., Statistical Computing New York; Academic Press, 427–440.
• [2] Friedman, J. H., and Tukey, J.W. (1974). A projection pursuit algorithm for exploratory data analysis., IEEE Transactions on Computers C-23 881–890.
• [3] Lee, E., Cook, D., Klinke, S., and Lumley, T. (2005). Projection pursuit for exploratory supervised classification., Journal of Computational and graphical Statistics 14(4) 831–846.
• [4] Lee, E. and Cook, D. (2010). A projection pursuit index for large $p$ small $n$ data., Statistical Computings 20 318–392.
• [5] Fielding A. and O’Muircheartaigh CA (1977). Binary segmentation in survey analysis with particular reference to AID., The Statistician 25 17–28.
• [6] Messenger R. and Mandell L. (1972). A model search techique for predictive nominal scale multivariate analysis., Journal of American Statistical Association 67(7) 768–772.
• [7] Quinlan J.R. (1993)., C4.5: Programs for Machine Learning., San Mateo, Morgan Kaufmann.
• [8] Breiman L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984)., Classification and Regression Trees, CRC Press.
• [9] R Development Core Team (2012)., R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, http://www.R-project.org/.
• [10] Therneau, T., Atkinson, B. and Ripley, B. (2012). rpart: Recursive Partitioning, http://cran.r-project.org/web/packages/rpart/index.html.
• [11] Loh, W. Y. and Vanichsetakul, N. (1988). Tree-structured classification via generalized discriminant analysis (with discussion)., Journal of American Statistical Association 83 715–728.
• [12] Kim, H. J. and Loh, W. Y. (2001). Classification trees with unbiased multiway splits., Journal of American Statistical Association 96 589–604.
• [13] Kim, H. J. and Loh, W. Y. (2003). Classification trees with bivariate linear discriminant node models., Journal of Computational and Graphical Statistics. 12 512–530.
• [14] Loh, W. Y. (2009). Improving the precision of classificaion trees., The Annals of Applied Statistics. 3(4) 1710–1737.
• [15] Loh, W. Y. and Shih, Y. (1997). Split selection methods for classificaiton trees., Statistica Sinica. 3(4) 1710–1737.
• [16] Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data., Applied Statistics. 29 119–127.
• [17] Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems, Annual Eugenices 7. Part II, 179–188.
• [18] Campbell, N.A and Mahon, R.J (1974). A multivariate study of variation in two species of rock crab of genus Leptograpsus, Australian Journal of Zoology 22. 417–1425.
• [19] Dudoit, S., Fridlyand, J. and Speed, T. P. (2002). Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data., Journal of the American statistical Association 97 77–87.
• [20] Breiman, L., Cutler, A (2012). randomForest: Breiman and Cutler’s random forests for classification and regression, http://cran.r-project.org/web/packages/randomForest/index.html.