Open Access
December 2011 The duality diagram in data analysis: Examples of modern applications
Omar De la Cruz, Susan Holmes
Ann. Appl. Stat. 5(4): 2266-2277 (December 2011). DOI: 10.1214/10-AOAS408
Abstract

Today’s data-heavy research environment requires the integration of different sources of information into structured data sets that can not be analyzed as simple matrices. We introduce an old technique, known in the European data analyses circles as the Duality Diagram Approach, put to new uses through the use of a variety of metrics and ways of combining different diagrams together. This issue of the Annals of Applied Statistics contains contemporary examples of how this approach provides solutions to hard problems in data integration. We present here the genesis of the technique and how it can be seen as a precursor of the modern kernel based approaches.

References

1.

Baty, F., Facompré, M., Wiegand, J., Schwager, J. and Brutsche, M. (2006). Analysis with respect to instrumental variables for the exploration of microarray data structures. BMC Bioinformatics 7 422.Baty, F., Facompré, M., Wiegand, J., Schwager, J. and Brutsche, M. (2006). Analysis with respect to instrumental variables for the exploration of microarray data structures. BMC Bioinformatics 7 422.

2.

Baty, F., Jaeger, D., Preiswerk, F., Schumacher, M. and Brutsche, M. (2008). Stability of gene contributions and identification of outliers in multivariate analysis of microarray data. BMC Bioinformatics 9 289.Baty, F., Jaeger, D., Preiswerk, F., Schumacher, M. and Brutsche, M. (2008). Stability of gene contributions and identification of outliers in multivariate analysis of microarray data. BMC Bioinformatics 9 289.

3.

Benzécri, J.-P. (1973). L’analyse des données: Leçons sur l’analyse factorielle et la reconnaissance des formes, et travaux du Laboratoire de statistique de l’Université de Paris VI. Dunod, Paris.Benzécri, J.-P. (1973). L’analyse des données: Leçons sur l’analyse factorielle et la reconnaissance des formes, et travaux du Laboratoire de statistique de l’Université de Paris VI. Dunod, Paris.

4.

Cailliez, F. and Pages, J. P. (1976). Introduction à l’analyse des données. SMASH, Paris.Cailliez, F. and Pages, J. P. (1976). Introduction à l’analyse des données. SMASH, Paris.

5.

Chessel, D., Dufour, A. and Thioulouse, J. (2004). The ade4 package, I: One-table methods. R News 4 5–10.Chessel, D., Dufour, A. and Thioulouse, J. (2004). The ade4 package, I: One-table methods. R News 4 5–10.

6.

Culhane, A., Perriere, G., Considine, E., Cotter, T. and Higgins, D. (2002). Between-group analysis of microarray data. Bioinformatics 18 1600.Culhane, A., Perriere, G., Considine, E., Cotter, T. and Higgins, D. (2002). Between-group analysis of microarray data. Bioinformatics 18 1600.

7.

Culhane, A., Perrière, G. and Higgins, D. (2003). Cross-platform comparison and visualisation of gene expression data using co-inertia analysis. BMC Bioinformatics 4 59.Culhane, A., Perrière, G. and Higgins, D. (2003). Cross-platform comparison and visualisation of gene expression data using co-inertia analysis. BMC Bioinformatics 4 59.

8.

Dray, S. and Dufour, A. (2007). The ade4 package: Implementing the duality diagram for ecologists. J. Statist. Softw. 22 6.Dray, S. and Dufour, A. (2007). The ade4 package: Implementing the duality diagram for ecologists. J. Statist. Softw. 22 6.

9.

Dray, S., Dufour, A. and Chessel, D. (2007). The ade4 package—II: Two-table and k-table methods. R News 7(2) 47–52.Dray, S., Dufour, A. and Chessel, D. (2007). The ade4 package—II: Two-table and k-table methods. R News 7(2) 47–52.

10.

Dray, S. and Jombart, T. (2011). Revisiting Guerry’s data: Introducing spatial constraints in multivariate analysis. Ann. Appl. Statist. 5 2278–2299. MR2907115 1234.62092 10.1214/10-AOAS356 euclid.aoas/1324399595 Dray, S. and Jombart, T. (2011). Revisiting Guerry’s data: Introducing spatial constraints in multivariate analysis. Ann. Appl. Statist. 5 2278–2299. MR2907115 1234.62092 10.1214/10-AOAS356 euclid.aoas/1324399595

11.

Escoufier, Y. (1980). L’analyse conjointe de plusieurs matrices de données. In Biométrie et Temps (E. Jolivet, ed.) 59–76. Societe Francaise de Biométrie, Paris.Escoufier, Y. (1980). L’analyse conjointe de plusieurs matrices de données. In Biométrie et Temps (E. Jolivet, ed.) 59–76. Societe Francaise de Biométrie, Paris.

12.

Escoufier, Y. (2006). Operator related to a data matrix: A survey. In COMPSTAT 2006—Proceedings in Computational Statistics 285–297. Physica, Heidelberg. MR2330545 10.1007/978-3-7908-1709-6_22Escoufier, Y. (2006). Operator related to a data matrix: A survey. In COMPSTAT 2006—Proceedings in Computational Statistics 285–297. Physica, Heidelberg. MR2330545 10.1007/978-3-7908-1709-6_22

13.

Fagan, A., Culhane, A. and Higgins, D. (2007). A multivariate analysis approach to the integration of proteomic and gene expression data. Proteomics 7 2162–2171.Fagan, A., Culhane, A. and Higgins, D. (2007). A multivariate analysis approach to the integration of proteomic and gene expression data. Proteomics 7 2162–2171.

14.

Gifi, A. (1990). Nonlinear Multivariate Analysis. Wiley, Chichester. MR1076188 0697.62048Gifi, A. (1990). Nonlinear Multivariate Analysis. Wiley, Chichester. MR1076188 0697.62048

15.

Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations, 3rd ed. Johns Hopkins Univ. Press, Baltimore, MD. MR1417720Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations, 3rd ed. Johns Hopkins Univ. Press, Baltimore, MD. MR1417720

16.

Holmes, S. (2006). Multivariate data analysis: The French way. In Probability and Statistics: Essays in Honor of David A. Freedman (D. Nolan and T. Speed, eds.) 219–233. IMS, Beachwood, OH. MR2459953 1166.62310 10.1214/193940307000000455Holmes, S. (2006). Multivariate data analysis: The French way. In Probability and Statistics: Essays in Honor of David A. Freedman (D. Nolan and T. Speed, eds.) 219–233. IMS, Beachwood, OH. MR2459953 1166.62310 10.1214/193940307000000455

17.

Ihaka, R. and Gentleman, R. (1996). R: A language for data analysis and graphics. J. Comput. Graph. Statist. 5 299–314.Ihaka, R. and Gentleman, R. (1996). R: A language for data analysis and graphics. J. Comput. Graph. Statist. 5 299–314.

18.

Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis. Academic Press, London. MR560319Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis. Academic Press, London. MR560319

19.

Purdom, E. (2011). Analysis of a data matrix and a graph: Metagenomic data and the phylogenetic tree. Ann. Appl. Statist. 5 2326–2358. MR2907117 1234.62148 10.1214/10-AOAS402 euclid.aoas/1324399597 Purdom, E. (2011). Analysis of a data matrix and a graph: Metagenomic data and the phylogenetic tree. Ann. Appl. Statist. 5 2326–2358. MR2907117 1234.62148 10.1214/10-AOAS402 euclid.aoas/1324399597

20.

Rao, C. R. (1964). The use and interpretation of principal component analysis in applied research. Sankhyā A 26 329–359. MR184375Rao, C. R. (1964). The use and interpretation of principal component analysis in applied research. Sankhyā A 26 329–359. MR184375

21.

Schölkopf, B., Smola, A. and Muller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10 1299–1319.Schölkopf, B., Smola, A. and Muller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10 1299–1319.

22.

Schölkopf, B., Tsuda, K. and Vert, J.-P. (2004). Kernel Methods in Computational Biology. MIT Press, Cambridge, MA.Schölkopf, B., Tsuda, K. and Vert, J.-P. (2004). Kernel Methods in Computational Biology. MIT Press, Cambridge, MA.

23.

Shinkareva, S., Mason, R., Malave, V., Wang, W., Mitchell, T. and Just, M. (2008). Using fMRI brain activation to identify cognitive states associated with perception of tools and dwellings. PLoS One 3 e1394.Shinkareva, S., Mason, R., Malave, V., Wang, W., Mitchell, T. and Just, M. (2008). Using fMRI brain activation to identify cognitive states associated with perception of tools and dwellings. PLoS One 3 e1394.

24.

Thioulouse, J. (2011). Simultaneous analysis of a sequence of paired ecological tables: A comparison of several methods. Ann. Appl. Statist. 5 2300–2325. MR2907116 1234.62154 10.1214/10-AOAS372 euclid.aoas/1324399596 Thioulouse, J. (2011). Simultaneous analysis of a sequence of paired ecological tables: A comparison of several methods. Ann. Appl. Statist. 5 2300–2325. MR2907116 1234.62154 10.1214/10-AOAS372 euclid.aoas/1324399596
Copyright © 2011 Institute of Mathematical Statistics
Omar De la Cruz and Susan Holmes "The duality diagram in data analysis: Examples of modern applications," The Annals of Applied Statistics 5(4), 2266-2277, (December 2011). https://doi.org/10.1214/10-AOAS408
Published: December 2011
Vol.5 • No. 4 • December 2011
Back to Top