Institute of Mathematical Statistics Collections

Projection pursuit for discrete data

Persi Diaconis and Julia Salzman

Full-text: Open access

Abstract

This paper develops projection pursuit for discrete data using the discrete Radon transform. Discrete projection pursuit is presented as an exploratory method for finding informative low dimensional views of data such as binary vectors, rankings, phylogenetic trees or graphs. We show that for most data sets, most projections are close to uniform. Thus, informative summaries are ones deviating from uniformity. Syllabic data from several of Plato’s great works is used to illustrate the methods. Along with some basic distribution theory, an automated procedure for computing informative projections is introduced.

Chapter information

Source
Deborah Nolan and Terry Speed, eds., Probability and Statistics: Essays in Honor of David A. Freedman (Beachwood, Ohio, USA: Institute of Mathematical Statistics, 2008), 265-288

Dates
First available in Project Euclid: 7 April 2008

Permanent link to this document
https://projecteuclid.org/euclid.imsc/1207580088

Digital Object Identifier
doi:10.1214/193940307000000482

Zentralblatt MATH identifier
1166.62048

Subjects
Primary: 44A12: Radon transform [See also 92C55] 62K10: Block designs 90C08: Special problems of linear programming (transportation, multi-index, etc.)

Keywords
binary vector discrete data discrete Radon transform least uniform partition phylogenetic tree Plato projection pursuit ranking syllable patterns

Rights
Copyright © 2008, Institute of Mathematical Statistics

Citation

Diaconis, Persi; Salzman, Julia. Projection pursuit for discrete data. Probability and Statistics: Essays in Honor of David A. Freedman, 265--288, Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2008. doi:10.1214/193940307000000482. https://projecteuclid.org/euclid.imsc/1207580088


Export citation

References

  • [1] Ahn, J. S., Hofmann, H. and Cook, D. (2003). A projection pursuit method on the multidimensional squared contingency table. Comput. Statist. 18 605–626.
  • [2] Atkinson, A. C. (1970). A method for discriminating between models. J. Roy. Statist. Soc. Ser. B 32 323–353.
  • [3] Bailey, R. (2004). Association Schemes: Designed Experiments, Algebra and Combinatorics. Cambridge Univ. Press.
  • [4] Bolker, E. (1987). The finite Radon transform. Contemp. Math. 63 27–50.
  • [5] Boneva, L. I. (1971). A new approach to a problem of chronological associated with the works of Plato. In Mathematics in the Archaeological and Historical Sciences (R. R. Hodson, D. G. Kendall and F. Tautu, eds.). Edinburgh Univ. Press.
  • [6] Brandwood, L. (1976). A Word Index to Plato. W. S. Maney, Leeds.
  • [7] Cameron, P. (1976). Parallelisms of Complete Designs. Cambridge Univ. Press.
  • [8] Charnomordic, B. and Holmes, S. (2001). Correspondence analysis with R. Statist. Comput. Graph. 12 19–25.
  • [9] Cheng, T. T. (1949). The normal approximation to the Poisson distribution and a conjecture of Ramanujan. Bull. Amer. Math. Soc. 55 396–401.
  • [10] Constantine, G. M. (1987). Combinatorial Theory and Statistical Designs. Wiley, New York.
  • [11] Cox, D. E. and Brandwood, L. (1959). On a discriminatory problem connected with the works of Plato. J. Roy. Statist. Soc. Ser. B 21 195–200.
  • [12] Critchlow, D. (1988). Metric Methods for Analyzing Partially Ranked Data. Springer-Verlag, Berlin.
  • [13] Dedeo, M. and Velasquez, E. (2003). The Radon transform on ℤnk. SIAM J. Discrete Math. 18 472–478.
  • [14] Dembrowski, P. (1968). Finite Geometries. Springer, New York.
  • [15] Diaconis, P. and Freedman, D. (1982). The mode of an empirical histogram. Pacific J. Math. 100 359–385.
  • [16] Diaconis, P. and Freedman, D. (1982). Asymptotics of graphical projection pursuit. Ann. Statist. 12 793–815.
  • [17] Diaconis, P. and Graham, R. (1985). Finite Radon transforms on ℤ2k. Pacific J. Math. 118 323–345.
  • [18] Diaconis, P. (1988). Group representations in probability and statistics. In IMS Lecture Notes – Monograph Series 11 (S. S. Gupta, ed.). Institute of Mathematical Statistics, Hawyard CA.
  • [19] Diaconis, P. (1989). The 1987 Wald Memorial Lectures: A generalization of spectral analysis with application to ranked data. Ann. Statist. 17 949–979.
  • [20] Diaconis, P., Holmes, S., Janson, S., Lalley, S. and Pemantle, R. (1995). Metrics on compositions and coincidences among renewal processes. In Random Discrete Structures 81–101. IMA Vol. Math. Appl. 76. Springer, New York.
  • [21] Dudley, R. M. (2002). Real Analysis and Probability. Cambridge Univ. Press.
  • [22] Donoho, D. (1981). On minimum entropy deconvolution. In Applied Time Series Analysis (D. F. Findley, ed.) II 565–608. Academic Press, New York.
  • [23] Feller, W. (1971). An Introduction to Probability Theory and Its Applications. II, 2nd ed. Wiley, New York.
  • [24] Fill, J. A. (1989). The Radon transform on Zn. SIAM J. Discrete Math. 2 262–283.
  • [25] Fligner, M. A. and Verducci, J. S. (1993). Probability Models and Statistical Analyses for Ranking Data. Springer, New York.
  • [26] Friedman, J., Stuetzle, W. and Schroeder, A. (1984). Projection pursuit density estimation. J. Amer. Statist. Assoc. 79 599–608.
  • [27] Friedman, J. and Stuetzle, W. (1981). Projection pursuit regression. J. Amer. Statist. Assoc. 76 817–823.
  • [28] Friedman, J. and Tukey, J. W. T. (1974). A projection pursuit algorithm for exploratory data analysis. IEEE Trans. Comput. 9 881–890.
  • [29] Goldberg, A. CS-2 algorithm. Available at www.avglab.org/andrew.
  • [30] Hall, P. (1989). On polynomial-based projection indices for exploratory projection pursuit. Ann. Statist. 17 589–605.
  • [31] Hedayat, A. S., Sloane, N. J. and Stufken, J. (1999). Orthogonal Arrays: Theory and Applications. Springer, New York.
  • [32] Huber, P. (1985). Projection pursuit. Ann. Statist. 13 435–475.
  • [33] Hwang, J.-N., Law, S-R. and Lippman, A. (1994). Nonparametric multivariate density estimation: a comparative study. IEEE Trans. Signal Processing 42 2795–2810.
  • [34] James, G. D. (1978). The Representation Theory of the Symmetric Groups. Springer, New York.
  • [35] Kruskal, J. B. (1969). Toward a practical method which helps uncover the structure of a set of multivariate observations by finding the linear transformation that optimizes a new index of condensation. In Statistical Computation. (R. C. Milton and J. A. Nelder, eds.). Academic Press, New York.
  • [36] Kruskal, J. B. (1972). Linear transformation of multivariate data to reveal clustering. In Multidimensional Scaling: Theory and Applications in the Behavioral Sciences 1. Theory. Seminar Press, New York.
  • [37] Kung, J. (1979). The Radon transform of a combinatorial geometry. I. J. Combin. Theory A 37 97–102.
  • [38] Lander, E. (1982). Symmetric Designs, an Algebraic Approach. Cambridge Univ. Press.
  • [39] Marden, J. I. (1995). Analyzing and Modeling Rank Data. Chapman and Hall, New York.
  • [40] Pearson, K. (1968). Tables of the Incomplete Beta-Function, 2nd ed. Cambridge Univ. Press.
  • [41] Polzehl, J. (1993). Projection Pursuit Discriminant Analysis. CORE Discussion Paper, Universit’e Catholique de Louvain.
  • [42] Posse, C. (1995). Tools for two-dimensional projection pursuit. J. Comput. Graph. Statist. 4 83–100.
  • [43] Pratt, J. (1959). On a general concept of “In probability”. Ann. Math. Statist. 30 549–558.
  • [44] Solomon, H. (1961). Studies in Item Analysis and Prediction. Stanford Univ. Press.
  • [45] Stein, C. (1992). A way of using auxiliary randomization. In Probability Theory (Singapore, 1989), 159–180. de Gruyter, Berlin.
  • [46] Thompson, G. L. (1993). Generalized permutation polytopes and exploratory graphical methods for ranked data. Ann. Statist. 21 1401–1430.
  • [47] Velasquez, E. (1997). The Radon transform on finite symmetric spaces. Pacific J. Math. 177 369–376.
  • [48] Villani, C. (2003). Topics in Optimal Transportation. Amer. Math. Soc., Providence, RI.
  • [49] Wishart, D. and Leach, S. V. (1970). A multivariate analysis of Platonic prose rhythm. Computer Studies 3 90–99.