The Annals of Applied Statistics

Horseshoes in multidimensional scaling and local kernel methods

Persi Diaconis, Sharad Goel, and Susan Holmes

Full-text: Open access

Abstract

Classical multidimensional scaling (MDS) is a method for visualizing high-dimensional point clouds by mapping to low-dimensional Euclidean space. This mapping is defined in terms of eigenfunctions of a matrix of interpoint dissimilarities. In this paper we analyze in detail multidimensional scaling applied to a specific dataset: the 2005 United States House of Representatives roll call votes. Certain MDS and kernel projections output “horseshoes” that are characteristic of dimensionality reduction techniques. We show that, in general, a latent ordering of the data gives rise to these patterns when one only has local information. That is, when only the interpoint distances for nearby points are known accurately. Our results provide a rigorous set of results and insight into manifold learning in the special case where the manifold is a curve.

Article information

Source
Ann. Appl. Stat. Volume 2, Number 3 (2008), 777-807.

Dates
First available in Project Euclid: 13 October 2008

Permanent link to this document
http://projecteuclid.org/euclid.aoas/1223908041

Digital Object Identifier
doi:10.1214/08-AOAS165

Zentralblatt MATH identifier
1149.62316

Mathematical Reviews number (MathSciNet)
MR2516794

Citation

Diaconis, Persi; Goel, Sharad; Holmes, Susan. Horseshoes in multidimensional scaling and local kernel methods. Ann. Appl. Stat. 2 (2008), no. 3, 777--807. doi:10.1214/08-AOAS165. http://projecteuclid.org/euclid.aoas/1223908041.


Export citation

References

  • Albouy, A. (2004). Mutual distances in celestial mechanics. Lectures at Nankai Institute, Tianjin, China. Available at http://www.imcce.fr/fr/presentation/equipes/ASD/preprints/prep.2004/Albouy_Nankai09_2004.pdf.
  • Americans for Democratic Action (2005). ADA Congressional voting record—U.S. House of Representatives. Available at http://www.adaction.org.
  • Bogomolny, E., Bohigas, O. and Schmit, C. (2003). Spectral properties of distance matrices. J. Phys. A: Math. Gen. 36 3595–3616. Available at http://www.citebase.org/abstract?id=oai:arXiv.org:nlin/0301044.
  • Borchardt, C. W. (1866). Ueber die aufgabe des maximum, welche der bestimmung des tetraeders von grösstem volumen bei gegebenem flächeninhalt der seitenflächen für mehr als drei dimensionen entspricht. Math. Abhand. Akad. Wiss. Berlin 121–155.
  • Borg, I. and Groenen, P. (1997). Modern Multidimensional Scaling: Theory and Applications. Springer, New York.
  • Burden, B. C., Caldeira, G. A. and Groseclose, T. (2000). Measuring the ideologies of U. S. senators: The song remains the same. Legislative Studies Quarterly 25 237–258.
  • Cantoni, A. and Butler, P. (1976). Eigenvalues and eigenvectors of symmetric centrosymmetrlc matrices. Linear Algebra Appl. 13 275–288.
  • Clinton, J., Jackman, S. and Rivers, D. (2004). The statistical analysis of roll call data. American Political Science Review 355–370.
  • Coombs, C. H. (1964). A Theory of Data. Wiley, New York.
  • Cox, T. F. and Cox, M. A. A. (2000). Multidimensional Scaling. Chapman and Hall, London.
  • Diaconis, P., Goel, S. and Holmes, S. (2008). Supplement to “Horseshoes in multidimensional scaling and local kernel methods.” DOI: 10.1214/08-AOAS165SUPP.
  • de Leeuw, J. (2005). Multidimensional unfolding. In Encyclopedia of Statistics in Behavioral Science. Wiley, New York.
  • de Leeuw, J. (2007). A horseshoe for multidimensional scaling. Technical report, UCLA.
  • Dufrene, M. and Legendre, P. (1991). Geographic structure and potential ecological factors in Belgium. J. Biogeography. Available at http://links.jstor.org/sici?sici=0305-0270(199105)18%253A3%253C257%253A%GSAPEF%253E2.0.CO%253B2-F.
  • Good, I. J. (1970). The inverse of a centrosymmetric matrix. Technometrics 12 925–928.
  • Guttman, L. (1968). A general nonmetric technique for finding the smallest coordinate space for a configuration of …. Psychometrika. Available at http://www.springerlink.com/index/AG2018142W42704L.pdf.
  • Heckman, J. J. and Snyder, J. M. (1997). Linear probability models of the demand for attributes with an empirical application to estimating the preferences of legislators. RAND J. Economics 28 S142–S189.
  • Hill, M. O. and Gauch, H. G. (1980). Detrended correspondence analysis, an improved ordination technique. Vegetatio 42 47–58.
  • Iwatsubo, S. (1984). The analytical solutions of an eigenvalue problem in the case of applying optimal scoring method to some types of data. In Data Analysis and Informatics III 31–40. North-Holland, Amsterdam.
  • Kendall, D. G. (1970). A mathematical approach to seriation. Phil. Trans. Roy. Soc. London 269 125–135.
  • Mardia, K., Kent, J. and Bibby, J. (1979). Multivariate Analysis. Academic Press, New York.
  • Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15 1373–1396. Available at http://www.mitpressjournals.org/doi/abs/10.1162/089976603321780317.
  • Office of the Clerk—U.S. House of Representatives. (2005). U.S. House of Representatives roll call votes 109th Congress—1st session. Available at http://clerk.house.gov.
  • Palmer, M. (2008). Ordination methods for ecologists. Available at http://ordination.okstate.edu/.
  • Parlett, B. N. (1980). The Symmetric Eigenvalue Problem. Prentice Hall, Englewood Cliffs, NJ.
  • Podani, J. and Miklos, I. (2002). Resemblance coefficients and the horseshoe effect in principal coordinates analysis. Ecology 3331–3343.
  • Roweis, S. T. and Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science 2323–2326.
  • Schoenberg, I. J. (1935). Remarks to Maurice Fréchet’s article “Sur la définition axiomatique d’une classe d’espace distanciés vectoriellement applicable sur l’espace de Hilbert.” Ann. of Math. (2) 36 724–732.
  • Schölkopf, B., Smola, A. and Muller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10 1299–1319. Available at http://www.mitpressjournals.org/doi/abs/10.1162/089976698300017467.
  • Shepard, R. N. (1962). The analysis of proximities: Multidimensional scaling with an unknown distance function. I. Psychometrika 27 125–140.
  • Shi, J. and Malik, J. (2000). Normalized cuts and image segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence 22 888–905. Available at citeseer.ist.psu.edu/shi97normalized.html.
  • Tenenbaum, J. B., de Silva, V. and Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science 2319–2323.
  • ter Braak, C. (1985). Correspondence analysis of incidence and abundance data: Properties in terms of a unimodal response …. Biometrics. Available at http://links.jstor.org/sici?sici=0006-341X(198512)41%253A4%253C859%253A%CAOIAA%253E2.0.CO%253B2-S.
  • ter Braak, C. J. F. (1987). Ordination. In Data Analysis in Community and Landscape Ecology 81–173. Center for Agricultural Publishing and Documentation. Wageningen, The Netherlands.
  • ter Braak, C. and Prentice, I. (1988). A theory of gradient analysis. Advances in Ecological Research. Available at http://cat.inist.fr/?aModele=afficheN&cpsidt=7248779.
  • Torgerson, W. S. (1952). Multidimensional scaling. I. Theory and method. Psychometrika 17 401–419.
  • von Luxburg, U., Belkin, M. and Bousquet, O. (2008). Consistency of spectral clustering. Ann. Statist. 36 555–586.
  • Wartenberg, D., Ferson, S. and Rohlf, F. (1987). Putting things in order: A critique of detrended correspondence analysis. The American Naturalist. Available at http://links.jstor.org/sici?sici=0003-0147(198703)129%253A3%253C434%253%APTIOAC%253E2.0.CO%253B2-3.
  • Weaver, J. R. (1985). Centrosymmetric (cross-symmetric) matrices, their basic properties, eigenvalues, and eigenvectors. Amer. Math. Monthly 92 711–717.
  • Williams, C. K. (2000). On a connection between kernel PCA and metric multidimensional scaling. In NIPS 675–681.
  • Young, G. and Householder, A. S. (1938). Discussion of a set of points in terms of their mutual distances. Psychometrika 3 19–22.

Supplemental materials