Institute of Mathematical Statistics Collections

Projection pursuit for discrete data

Persi Diaconis and Julia Salzman

Full-text: Open access


This paper develops projection pursuit for discrete data using the discrete Radon transform. Discrete projection pursuit is presented as an exploratory method for finding informative low dimensional views of data such as binary vectors, rankings, phylogenetic trees or graphs. We show that for most data sets, most projections are close to uniform. Thus, informative summaries are ones deviating from uniformity. Syllabic data from several of Plato’s great works is used to illustrate the methods. Along with some basic distribution theory, an automated procedure for computing informative projections is introduced.

Chapter information

Deborah Nolan and Terry Speed, eds., Probability and Statistics: Essays in Honor of David A. Freedman (Beachwood, Ohio, USA: Institute of Mathematical Statistics, 2008), 265-288

First available in Project Euclid: 7 April 2008

Permanent link to this document

Digital Object Identifier

Zentralblatt MATH identifier

Primary: 44A12: Radon transform [See also 92C55] 62K10: Block designs 90C08: Special problems of linear programming (transportation, multi-index, etc.)

binary vector discrete data discrete Radon transform least uniform partition phylogenetic tree Plato projection pursuit ranking syllable patterns

Copyright © 2008, Institute of Mathematical Statistics


Diaconis, Persi; Salzman, Julia. Projection pursuit for discrete data. Probability and Statistics: Essays in Honor of David A. Freedman, 265--288, Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2008. doi:10.1214/193940307000000482.

Export citation


  • [1] Ahn, J. S., Hofmann, H. and Cook, D. (2003). A projection pursuit method on the multidimensional squared contingency table. Comput. Statist. 18 605–626.
  • [2] Atkinson, A. C. (1970). A method for discriminating between models. J. Roy. Statist. Soc. Ser. B 32 323–353.
  • [3] Bailey, R. (2004). Association Schemes: Designed Experiments, Algebra and Combinatorics. Cambridge Univ. Press.
  • [4] Bolker, E. (1987). The finite Radon transform. Contemp. Math. 63 27–50.
  • [5] Boneva, L. I. (1971). A new approach to a problem of chronological associated with the works of Plato. In Mathematics in the Archaeological and Historical Sciences (R. R. Hodson, D. G. Kendall and F. Tautu, eds.). Edinburgh Univ. Press.
  • [6] Brandwood, L. (1976). A Word Index to Plato. W. S. Maney, Leeds.
  • [7] Cameron, P. (1976). Parallelisms of Complete Designs. Cambridge Univ. Press.
  • [8] Charnomordic, B. and Holmes, S. (2001). Correspondence analysis with R. Statist. Comput. Graph. 12 19–25.
  • [9] Cheng, T. T. (1949). The normal approximation to the Poisson distribution and a conjecture of Ramanujan. Bull. Amer. Math. Soc. 55 396–401.
  • [10] Constantine, G. M. (1987). Combinatorial Theory and Statistical Designs. Wiley, New York.
  • [11] Cox, D. E. and Brandwood, L. (1959). On a discriminatory problem connected with the works of Plato. J. Roy. Statist. Soc. Ser. B 21 195–200.
  • [12] Critchlow, D. (1988). Metric Methods for Analyzing Partially Ranked Data. Springer-Verlag, Berlin.
  • [13] Dedeo, M. and Velasquez, E. (2003). The Radon transform on ℤnk. SIAM J. Discrete Math. 18 472–478.
  • [14] Dembrowski, P. (1968). Finite Geometries. Springer, New York.
  • [15] Diaconis, P. and Freedman, D. (1982). The mode of an empirical histogram. Pacific J. Math. 100 359–385.
  • [16] Diaconis, P. and Freedman, D. (1982). Asymptotics of graphical projection pursuit. Ann. Statist. 12 793–815.
  • [17] Diaconis, P. and Graham, R. (1985). Finite Radon transforms on ℤ2k. Pacific J. Math. 118 323–345.
  • [18] Diaconis, P. (1988). Group representations in probability and statistics. In IMS Lecture Notes – Monograph Series 11 (S. S. Gupta, ed.). Institute of Mathematical Statistics, Hawyard CA.
  • [19] Diaconis, P. (1989). The 1987 Wald Memorial Lectures: A generalization of spectral analysis with application to ranked data. Ann. Statist. 17 949–979.
  • [20] Diaconis, P., Holmes, S., Janson, S., Lalley, S. and Pemantle, R. (1995). Metrics on compositions and coincidences among renewal processes. In Random Discrete Structures 81–101. IMA Vol. Math. Appl. 76. Springer, New York.
  • [21] Dudley, R. M. (2002). Real Analysis and Probability. Cambridge Univ. Press.
  • [22] Donoho, D. (1981). On minimum entropy deconvolution. In Applied Time Series Analysis (D. F. Findley, ed.) II 565–608. Academic Press, New York.
  • [23] Feller, W. (1971). An Introduction to Probability Theory and Its Applications. II, 2nd ed. Wiley, New York.
  • [24] Fill, J. A. (1989). The Radon transform on Zn. SIAM J. Discrete Math. 2 262–283.
  • [25] Fligner, M. A. and Verducci, J. S. (1993). Probability Models and Statistical Analyses for Ranking Data. Springer, New York.
  • [26] Friedman, J., Stuetzle, W. and Schroeder, A. (1984). Projection pursuit density estimation. J. Amer. Statist. Assoc. 79 599–608.
  • [27] Friedman, J. and Stuetzle, W. (1981). Projection pursuit regression. J. Amer. Statist. Assoc. 76 817–823.
  • [28] Friedman, J. and Tukey, J. W. T. (1974). A projection pursuit algorithm for exploratory data analysis. IEEE Trans. Comput. 9 881–890.
  • [29] Goldberg, A. CS-2 algorithm. Available at
  • [30] Hall, P. (1989). On polynomial-based projection indices for exploratory projection pursuit. Ann. Statist. 17 589–605.
  • [31] Hedayat, A. S., Sloane, N. J. and Stufken, J. (1999). Orthogonal Arrays: Theory and Applications. Springer, New York.
  • [32] Huber, P. (1985). Projection pursuit. Ann. Statist. 13 435–475.
  • [33] Hwang, J.-N., Law, S-R. and Lippman, A. (1994). Nonparametric multivariate density estimation: a comparative study. IEEE Trans. Signal Processing 42 2795–2810.
  • [34] James, G. D. (1978). The Representation Theory of the Symmetric Groups. Springer, New York.
  • [35] Kruskal, J. B. (1969). Toward a practical method which helps uncover the structure of a set of multivariate observations by finding the linear transformation that optimizes a new index of condensation. In Statistical Computation. (R. C. Milton and J. A. Nelder, eds.). Academic Press, New York.
  • [36] Kruskal, J. B. (1972). Linear transformation of multivariate data to reveal clustering. In Multidimensional Scaling: Theory and Applications in the Behavioral Sciences 1. Theory. Seminar Press, New York.
  • [37] Kung, J. (1979). The Radon transform of a combinatorial geometry. I. J. Combin. Theory A 37 97–102.
  • [38] Lander, E. (1982). Symmetric Designs, an Algebraic Approach. Cambridge Univ. Press.
  • [39] Marden, J. I. (1995). Analyzing and Modeling Rank Data. Chapman and Hall, New York.
  • [40] Pearson, K. (1968). Tables of the Incomplete Beta-Function, 2nd ed. Cambridge Univ. Press.
  • [41] Polzehl, J. (1993). Projection Pursuit Discriminant Analysis. CORE Discussion Paper, Universit’e Catholique de Louvain.
  • [42] Posse, C. (1995). Tools for two-dimensional projection pursuit. J. Comput. Graph. Statist. 4 83–100.
  • [43] Pratt, J. (1959). On a general concept of “In probability”. Ann. Math. Statist. 30 549–558.
  • [44] Solomon, H. (1961). Studies in Item Analysis and Prediction. Stanford Univ. Press.
  • [45] Stein, C. (1992). A way of using auxiliary randomization. In Probability Theory (Singapore, 1989), 159–180. de Gruyter, Berlin.
  • [46] Thompson, G. L. (1993). Generalized permutation polytopes and exploratory graphical methods for ranked data. Ann. Statist. 21 1401–1430.
  • [47] Velasquez, E. (1997). The Radon transform on finite symmetric spaces. Pacific J. Math. 177 369–376.
  • [48] Villani, C. (2003). Topics in Optimal Transportation. Amer. Math. Soc., Providence, RI.
  • [49] Wishart, D. and Leach, S. V. (1970). A multivariate analysis of Platonic prose rhythm. Computer Studies 3 90–99.