Institute of Mathematical Statistics Collections

Projection pursuit for discrete data

Persi Diaconis, Julia Salzman

Abstract

This paper develops projection pursuit for discrete data using the discrete Radon transform. Discrete projection pursuit is presented as an exploratory method for finding informative low dimensional views of data such as binary vectors, rankings, phylogenetic trees or graphs. We show that for most data sets, most projections are close to uniform. Thus, informative summaries are ones deviating from uniformity. Syllabic data from several of Plato’s great works is used to illustrate the methods. Along with some basic distribution theory, an automated procedure for computing informative projections is introduced.

First Page: Show Hide
Primary Subjects: 44A12, 62K10, 90C08
Keywords: binary vector; discrete data; discrete Radon transform; least uniform partition; phylogenetic tree; Plato; projection pursuit; ranking; syllable patterns
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.imsc/1207580088
Digital Object Identifier: doi:10.1214/193940307000000482

References

[1] Ahn, J. S., Hofmann, H. and Cook, D. (2003). A projection pursuit method on the multidimensional squared contingency table. Comput. Statist. 18 605–626.
Mathematical Reviews (MathSciNet): MR2019385
[2] Atkinson, A. C. (1970). A method for discriminating between models. J. Roy. Statist. Soc. Ser. B 32 323–353.
[3] Bailey, R. (2004). Association Schemes: Designed Experiments, Algebra and Combinatorics. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR2047311
[4] Bolker, E. (1987). The finite Radon transform. Contemp. Math. 63 27–50.
Mathematical Reviews (MathSciNet): MR876312
Zentralblatt MATH: 0615.44004
[5] Boneva, L. I. (1971). A new approach to a problem of chronological associated with the works of Plato. In Mathematics in the Archaeological and Historical Sciences (R. R. Hodson, D. G. Kendall and F. Tautu, eds.). Edinburgh Univ. Press.
[6] Brandwood, L. (1976). A Word Index to Plato. W. S. Maney, Leeds.
[7] Cameron, P. (1976). Parallelisms of Complete Designs. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR419245
Zentralblatt MATH: 0333.05007
[8] Charnomordic, B. and Holmes, S. (2001). Correspondence analysis with R. Statist. Comput. Graph. 12 19–25.
[9] Cheng, T. T. (1949). The normal approximation to the Poisson distribution and a conjecture of Ramanujan. Bull. Amer. Math. Soc. 55 396–401.
Mathematical Reviews (MathSciNet): MR29487
Zentralblatt MATH: 0039.29002
Digital Object Identifier: doi:10.1090/S0002-9904-1949-09223-6
Project Euclid: euclid.bams/1183513742
[10] Constantine, G. M. (1987). Combinatorial Theory and Statistical Designs. Wiley, New York.
Mathematical Reviews (MathSciNet): MR891185
Zentralblatt MATH: 0617.05002
[11] Cox, D. E. and Brandwood, L. (1959). On a discriminatory problem connected with the works of Plato. J. Roy. Statist. Soc. Ser. B 21 195–200.
Mathematical Reviews (MathSciNet): MR109102
[12] Critchlow, D. (1988). Metric Methods for Analyzing Partially Ranked Data. Springer-Verlag, Berlin.
Mathematical Reviews (MathSciNet): MR818986
Zentralblatt MATH: 0589.62041
[13] Dedeo, M. and Velasquez, E. (2003). The Radon transform on ℤnk. SIAM J. Discrete Math. 18 472–478.
Mathematical Reviews (MathSciNet): MR2134409
Zentralblatt MATH: 1086.44001
Digital Object Identifier: doi:10.1137/S0895480103430764
[14] Dembrowski, P. (1968). Finite Geometries. Springer, New York.
Mathematical Reviews (MathSciNet): MR233275
[15] Diaconis, P. and Freedman, D. (1982). The mode of an empirical histogram. Pacific J. Math. 100 359–385.
[16] Diaconis, P. and Freedman, D. (1982). Asymptotics of graphical projection pursuit. Ann. Statist. 12 793–815.
Mathematical Reviews (MathSciNet): MR751274
Zentralblatt MATH: 0559.62002
Digital Object Identifier: doi:10.1214/aos/1176346703
Project Euclid: euclid.aos/1176346703
[17] Diaconis, P. and Graham, R. (1985). Finite Radon transforms on ℤ2k. Pacific J. Math. 118 323–345.
Mathematical Reviews (MathSciNet): MR789174
Zentralblatt MATH: 0581.43001
Project Euclid: euclid.pjm/1102706442
[18] Diaconis, P. (1988). Group representations in probability and statistics. In IMS Lecture Notes – Monograph Series 11 (S. S. Gupta, ed.). Institute of Mathematical Statistics, Hawyard CA.
Mathematical Reviews (MathSciNet): MR964069
Zentralblatt MATH: 0695.60012
[19] Diaconis, P. (1989). The 1987 Wald Memorial Lectures: A generalization of spectral analysis with application to ranked data. Ann. Statist. 17 949–979.
[20] Diaconis, P., Holmes, S., Janson, S., Lalley, S. and Pemantle, R. (1995). Metrics on compositions and coincidences among renewal processes. In Random Discrete Structures 81–101. IMA Vol. Math. Appl. 76. Springer, New York.
[21] Dudley, R. M. (2002). Real Analysis and Probability. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR1932358
[22] Donoho, D. (1981). On minimum entropy deconvolution. In Applied Time Series Analysis (D. F. Findley, ed.) II 565–608. Academic Press, New York.
[23] Feller, W. (1971). An Introduction to Probability Theory and Its Applications. II, 2nd ed. Wiley, New York.
[24] Fill, J. A. (1989). The Radon transform on Zn. SIAM J. Discrete Math. 2 262–283.
Mathematical Reviews (MathSciNet): MR990456
Zentralblatt MATH: 0712.44001
Digital Object Identifier: doi:10.1137/0402023
[25] Fligner, M. A. and Verducci, J. S. (1993). Probability Models and Statistical Analyses for Ranking Data. Springer, New York.
Mathematical Reviews (MathSciNet): MR1237197
[26] Friedman, J., Stuetzle, W. and Schroeder, A. (1984). Projection pursuit density estimation. J. Amer. Statist. Assoc. 79 599–608.
Mathematical Reviews (MathSciNet): MR763579
Digital Object Identifier: doi:10.2307/2288406
[27] Friedman, J. and Stuetzle, W. (1981). Projection pursuit regression. J. Amer. Statist. Assoc. 76 817–823.
Mathematical Reviews (MathSciNet): MR650892
Digital Object Identifier: doi:10.2307/2287576
[28] Friedman, J. and Tukey, J. W. T. (1974). A projection pursuit algorithm for exploratory data analysis. IEEE Trans. Comput. 9 881–890.
[29] Goldberg, A. CS-2 algorithm. Available at www.avglab.org/andrew.
[30] Hall, P. (1989). On polynomial-based projection indices for exploratory projection pursuit. Ann. Statist. 17 589–605.
Mathematical Reviews (MathSciNet): MR994252
Zentralblatt MATH: 0717.62051
Digital Object Identifier: doi:10.1214/aos/1176347127
Project Euclid: euclid.aos/1176347127
[31] Hedayat, A. S., Sloane, N. J. and Stufken, J. (1999). Orthogonal Arrays: Theory and Applications. Springer, New York.
Mathematical Reviews (MathSciNet): MR1693498
[32] Huber, P. (1985). Projection pursuit. Ann. Statist. 13 435–475.
Mathematical Reviews (MathSciNet): MR790553
Zentralblatt MATH: 0595.62059
Digital Object Identifier: doi:10.1214/aos/1176349519
Project Euclid: euclid.aos/1176349519
[33] Hwang, J.-N., Law, S-R. and Lippman, A. (1994). Nonparametric multivariate density estimation: a comparative study. IEEE Trans. Signal Processing 42 2795–2810.
[34] James, G. D. (1978). The Representation Theory of the Symmetric Groups. Springer, New York.
Mathematical Reviews (MathSciNet): MR513828
[35] Kruskal, J. B. (1969). Toward a practical method which helps uncover the structure of a set of multivariate observations by finding the linear transformation that optimizes a new index of condensation. In Statistical Computation. (R. C. Milton and J. A. Nelder, eds.). Academic Press, New York.
[36] Kruskal, J. B. (1972). Linear transformation of multivariate data to reveal clustering. In Multidimensional Scaling: Theory and Applications in the Behavioral Sciences 1. Theory. Seminar Press, New York.
[37] Kung, J. (1979). The Radon transform of a combinatorial geometry. I. J. Combin. Theory A 37 97–102.
Mathematical Reviews (MathSciNet): MR530281
Zentralblatt MATH: 0406.05023
Digital Object Identifier: doi:10.1016/0097-3165(79)90059-1
[38] Lander, E. (1982). Symmetric Designs, an Algebraic Approach. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR697566
Zentralblatt MATH: 0502.05010
[39] Marden, J. I. (1995). Analyzing and Modeling Rank Data. Chapman and Hall, New York.
Mathematical Reviews (MathSciNet): MR1346107
Zentralblatt MATH: 0853.62006
[40] Pearson, K. (1968). Tables of the Incomplete Beta-Function, 2nd ed. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR226815
[41] Polzehl, J. (1993). Projection Pursuit Discriminant Analysis. CORE Discussion Paper, Universit’e Catholique de Louvain.
[42] Posse, C. (1995). Tools for two-dimensional projection pursuit. J. Comput. Graph. Statist. 4 83–100.
[43] Pratt, J. (1959). On a general concept of “In probability”. Ann. Math. Statist. 30 549–558.
Mathematical Reviews (MathSciNet): MR104283
Zentralblatt MATH: 0091.14301
Digital Object Identifier: doi:10.1214/aoms/1177706267
Project Euclid: euclid.aoms/1177706267
[44] Solomon, H. (1961). Studies in Item Analysis and Prediction. Stanford Univ. Press.
Mathematical Reviews (MathSciNet): MR120758
[45] Stein, C. (1992). A way of using auxiliary randomization. In Probability Theory (Singapore, 1989), 159–180. de Gruyter, Berlin.
Mathematical Reviews (MathSciNet): MR1188718
Zentralblatt MATH: 0759.62008
[46] Thompson, G. L. (1993). Generalized permutation polytopes and exploratory graphical methods for ranked data. Ann. Statist. 21 1401–1430.
Mathematical Reviews (MathSciNet): MR1241272
Zentralblatt MATH: 0810.62004
Digital Object Identifier: doi:10.1214/aos/1176349265
Project Euclid: euclid.aos/1176349265
[47] Velasquez, E. (1997). The Radon transform on finite symmetric spaces. Pacific J. Math. 177 369–376.
Mathematical Reviews (MathSciNet): MR1444787
Zentralblatt MATH: 0882.44001
Digital Object Identifier: doi:10.2140/pjm.1997.177.369
[48] Villani, C. (2003). Topics in Optimal Transportation. Amer. Math. Soc., Providence, RI.
Mathematical Reviews (MathSciNet): MR1964483
Zentralblatt MATH: 1106.90001
[49] Wishart, D. and Leach, S. V. (1970). A multivariate analysis of Platonic prose rhythm. Computer Studies 3 90–99.

2012 © Institute of Mathematical Statistics

Institute of Mathematical Statistics Collections

Institute of Mathematical Statistics Collections