In biological experiments researchers often have information in
the form of a graph that supplements observed numerical data.
Incorporating the knowledge contained in these graphs into an
analysis of the numerical data is an important and nontrivial
task. We look at the example of metagenomic data—data from a
genomic survey of the abundance of different species of bacteria
in a sample. Here, the graph of interest is a phylogenetic tree
depicting the interspecies relationships among the bacteria
species. We illustrate that analysis of the data in a
nonstandard inner-product space effectively uses this additional
graphical information and produces more meaningful results.
References
Aluja-Ganet, T. and Nonell-Torrent, R. (1991). Local principal components analysis. Questiio 15 267–278.
Bach, F. R. and Jordan, M. I. (2002). Kernel independent component analysis. J. Mach. Learn. Res. 3 1–48.
Bapat, R., Kirkland, S. J. and Neumann, M. (2005). On distance matrices and Laplacians. Linear Algebra Appl. 401 193–209.
Biyikoğlu, T., Leydold, J. and Stadler, P. F. (2007). Laplacian Eigenvectors of Graphs. Lecture Notes in Mathematics 1915. Springer, Berlin.
Cavalli-Sforza, L. L. and Piazza, A. (1975). Analysis of evolution: Evolutionary rates, independence and treeness. Theoretical Population Biology 8 127–165.
Mathematical Reviews (MathSciNet):
MR526635
Chessel, D., Dufour, A.-B., Dray, S., with contributions from Jean R. Lobry, Ollier, S., Pavoine, S. and Thioulouse., J. (2005). ade4: Analysis of environmental data: Exploratory and Euclidean methods in environmental sciences. R package Version 1.4-1.
D’Ambra, L. and Lauro, N. C. (1992). Non-symmetrical exploratory data analysis. Statist. Appl. 4 511–529.
di Bella, G. and Jona-Lasinio, G. (1996). Including spatial contiguity information in the analysis of multispecific patterns. Environmental and Ecological Statistics 3 260–280.
Diestel, R. (2005). Graph Theory, 3rd ed. Graduate Texts in Mathematics 173. Springer, New York.
Dray, S. and Dufour, A.-B. (2007). The ade4 package: Implementing the duality diagram for ecologists. J. Statist. Softw. 22.
Dray, S., Saïd, S. and Debias, F. (2008). Spatial ordination of vegetation data using a generalization of Wartenberg’s multivariate spatial correlation. Journal of Vegetation Science 19 45–56.
Eckburg, P. B., Bik, E. M., Bernstein, C. N., Purdom, E., Dethlefsen, L., Sargent, M., Gill, S. R., Nelson, K. E. and Relman, D. A. (2005). Diversity of the human intestinal microbial flora. Science 308 1635–1638.
Escoufier, Y. (1987). The duality diagram: A means for better practical applications. In Developments in Numerical Ecology (P. Legendre and L. Legendre, eds.). NATO ASI Series G14 139–156. Springer, Berlin.
Mathematical Reviews (MathSciNet):
MR913539
Excoffier, L., Smouse, P. and Quattro, J. (1992). Analysis of molecular variance inferred from metric distances among DNA haplotypes: Application to human mitochondrial DNA restriction data. Genetics 131 479–491.
Felsenstein, J. (1981). Evolutionary trees from gene frequencies and quantitative characters: Finding maximum likelihood estimates. Evolution 35 1229–1242.
Gimaret-Carpentier, C., Chessel, D. and Pascal, J. P. (1998). Non-symmetric correspondence analysis: An alternative for community analysis with species occurrences data. Plant Ecology 138 97–112.
Golub, G. H. and van Loan, C. F. (1996). Matrix Computations, 3rd ed. Johns Hopkins Univ. Press, Baltimore.
Greenacre, M. J. (1984). Theory and Applications of Correspondence Analysis. Academic Press, London.
Mathematical Reviews (MathSciNet):
MR767260
Hansen, T. F. and Martins, E. P. (1996). Translating between microevolutionary process and macroevolutionary patterns: The correlation structure of interspecific data. Evolution 50 1404–1417.
Holmes, S. (2008). Multivariate analysis: The French way. In Probability and Statistics: Essays in Honor of David A. Freedman (D. Nolan and T. Speed, eds.). IMS Lecture Notes 2 219–233. IMS, Beachwood, OH.
Jolliffe, I. T. (2002). Principal Components Analysis, 2nd ed. Springer, New York.
Kondor, R. I. and Lafferty, J. (2002). Diffusion kernels on graphs and other discrete input spaces. In Proceedings of ICML 315–322.
Legendre, P. and Legendre, L. (1998). Numerical Ecology, 2nd English ed. Developments in Environmental Modeling 20. Elsevier, New York.
Maesschalck, R. D., Jouan-Rimbaud, D. and Massart, D. (2000). The Mahalanobis distance. Chemometrics and Intelligent Laboratory Systems 50 1–18.
Martin, A. (2002). Phylogenetic approaches for describing and comparing the diversity of microbial communities. Applied and Environmental Microbiology 68 3673–3682.
Martins, E. P. and Housworth, E. A. (2002). Phylogeny shape and the phylogenetic comparative method. Syst. Biol. 51 873–880.
Pavoine, S., Dufour, A.-B. and Chessel, D. (2004). From dissimilarities among species to dissimilarities among sites: A double principal coordinate analysis. J. Theoret. Biol. 228 523–537.
Pavoine, S., Ollier, S., Pontier, D. and Chessel, D. (2008). Testing for phylogenetic signal in phenotypic traits: New matrices of phylogenetic proximities. Theoretical Population Biology 73 79–91.
Pélissier, R., Couteron, P., Dray, S. and Sabatier, D. (2003). Consistency between ordination techniques and diversity measurements: Two strategies for species occurrence data. Ecology 84 242–251.
Purdom, E. (2006). Multivariate kernel methods in the analysis of graphical structures. Ph.D. thesis, Stanford Univ.
R Development Core Team (2008). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0.
Rao, C. R. (1982). Diversity and dissimilarity coefficients: A unified approach. Theoretical Population Biology 21 24–43.
Mathematical Reviews (MathSciNet):
MR662520
Rapaport, F., Zinovyev, A., Dutreix, M., Barillot, E. and Vert, J.-P. (2007). Classification of microarray data using gene networks. BMC Bioinformatics 8.
Rohlf, F. J. (2001). Comparative methods for the analysis of continuous variables: Geometric interpretations. Evolution 55 2143–2160.
Schölkopf, B. and Smola, A. J. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA.
Thioulouse, J., Chessel, D. and Champely, S. (1995). Multivariate analysis of spatial patterns: A unified approach to local and global structures. Environmental and Ecological Statistics 2 1–14.