Today’s data-heavy research environment requires the integration of different sources of information into structured data sets that can not be analyzed as simple matrices. We introduce an old technique, known in the European data analyses circles as the Duality Diagram Approach, put to new uses through the use of a variety of metrics and ways of combining different diagrams together. This issue of the Annals of Applied Statistics contains contemporary examples of how this approach provides solutions to hard problems in data integration. We present here the genesis of the technique and how it can be seen as a precursor of the modern kernel based approaches.
References
Baty, F., Facompré, M., Wiegand, J., Schwager, J. and Brutsche, M. (2006). Analysis with respect to instrumental variables for the exploration of microarray data structures. BMC Bioinformatics 7 422.Baty, F., Facompré, M., Wiegand, J., Schwager, J. and Brutsche, M. (2006). Analysis with respect to instrumental variables for the exploration of microarray data structures. BMC Bioinformatics 7 422.
Baty, F., Jaeger, D., Preiswerk, F., Schumacher, M. and Brutsche, M. (2008). Stability of gene contributions and identification of outliers in multivariate analysis of microarray data. BMC Bioinformatics 9 289.Baty, F., Jaeger, D., Preiswerk, F., Schumacher, M. and Brutsche, M. (2008). Stability of gene contributions and identification of outliers in multivariate analysis of microarray data. BMC Bioinformatics 9 289.
Benzécri, J.-P. (1973). L’analyse des données: Leçons sur l’analyse factorielle et la reconnaissance des formes, et travaux du Laboratoire de statistique de l’Université de Paris VI. Dunod, Paris.Benzécri, J.-P. (1973). L’analyse des données: Leçons sur l’analyse factorielle et la reconnaissance des formes, et travaux du Laboratoire de statistique de l’Université de Paris VI. Dunod, Paris.
Culhane, A., Perrière, G. and Higgins, D. (2003). Cross-platform comparison and visualisation of gene expression data using co-inertia analysis. BMC Bioinformatics 4 59.Culhane, A., Perrière, G. and Higgins, D. (2003). Cross-platform comparison and visualisation of gene expression data using co-inertia analysis. BMC Bioinformatics 4 59.
Dray, S. and Jombart, T. (2011). Revisiting Guerry’s data: Introducing spatial constraints in multivariate analysis. Ann. Appl. Statist. 5 2278–2299. MR2907115 1234.62092 10.1214/10-AOAS356 euclid.aoas/1324399595
Dray, S. and Jombart, T. (2011). Revisiting Guerry’s data: Introducing spatial constraints in multivariate analysis. Ann. Appl. Statist. 5 2278–2299. MR2907115 1234.62092 10.1214/10-AOAS356 euclid.aoas/1324399595
Escoufier, Y. (1980). L’analyse conjointe de plusieurs matrices de données. In Biométrie et Temps (E. Jolivet, ed.) 59–76. Societe Francaise de Biométrie, Paris.Escoufier, Y. (1980). L’analyse conjointe de plusieurs matrices de données. In Biométrie et Temps (E. Jolivet, ed.) 59–76. Societe Francaise de Biométrie, Paris.
Escoufier, Y. (2006). Operator related to a data matrix: A survey. In COMPSTAT 2006—Proceedings in Computational Statistics 285–297. Physica, Heidelberg. MR2330545 10.1007/978-3-7908-1709-6_22Escoufier, Y. (2006). Operator related to a data matrix: A survey. In COMPSTAT 2006—Proceedings in Computational Statistics 285–297. Physica, Heidelberg. MR2330545 10.1007/978-3-7908-1709-6_22
Fagan, A., Culhane, A. and Higgins, D. (2007). A multivariate analysis approach to the integration of proteomic and gene expression data. Proteomics 7 2162–2171.Fagan, A., Culhane, A. and Higgins, D. (2007). A multivariate analysis approach to the integration of proteomic and gene expression data. Proteomics 7 2162–2171.
Gifi, A. (1990). Nonlinear Multivariate Analysis. Wiley, Chichester. MR1076188 0697.62048Gifi, A. (1990). Nonlinear Multivariate Analysis. Wiley, Chichester. MR1076188 0697.62048
Holmes, S. (2006). Multivariate data analysis: The French way. In Probability and Statistics: Essays in Honor of David A. Freedman (D. Nolan and T. Speed, eds.) 219–233. IMS, Beachwood, OH. MR2459953 1166.62310 10.1214/193940307000000455Holmes, S. (2006). Multivariate data analysis: The French way. In Probability and Statistics: Essays in Honor of David A. Freedman (D. Nolan and T. Speed, eds.) 219–233. IMS, Beachwood, OH. MR2459953 1166.62310 10.1214/193940307000000455
Purdom, E. (2011). Analysis of a data matrix and a graph: Metagenomic data and the phylogenetic tree. Ann. Appl. Statist. 5 2326–2358. MR2907117 1234.62148 10.1214/10-AOAS402 euclid.aoas/1324399597
Purdom, E. (2011). Analysis of a data matrix and a graph: Metagenomic data and the phylogenetic tree. Ann. Appl. Statist. 5 2326–2358. MR2907117 1234.62148 10.1214/10-AOAS402 euclid.aoas/1324399597
Shinkareva, S., Mason, R., Malave, V., Wang, W., Mitchell, T. and Just, M. (2008). Using fMRI brain activation to identify cognitive states associated with perception of tools and dwellings. PLoS One 3 e1394.Shinkareva, S., Mason, R., Malave, V., Wang, W., Mitchell, T. and Just, M. (2008). Using fMRI brain activation to identify cognitive states associated with perception of tools and dwellings. PLoS One 3 e1394.
Thioulouse, J. (2011). Simultaneous analysis of a sequence of paired ecological tables: A comparison of several methods. Ann. Appl. Statist. 5 2300–2325. MR2907116 1234.62154 10.1214/10-AOAS372 euclid.aoas/1324399596
Thioulouse, J. (2011). Simultaneous analysis of a sequence of paired ecological tables: A comparison of several methods. Ann. Appl. Statist. 5 2300–2325. MR2907116 1234.62154 10.1214/10-AOAS372 euclid.aoas/1324399596