The Annals of Applied Statistics

Estimating a common covariance matrix for network meta-analysis of gene expression datasets in diffuse large B-cell lymphoma

Anders Ellern Bilgrau, Rasmus Froberg Brøndum, Poul Svante Eriksen, Karen Dybkær, and Martin Bøgsted

Full-text: Open access


The estimation of covariance matrices of gene expressions has many applications in cancer systems biology. Many gene expression studies, however, are hampered by low sample size and it has therefore become popular to increase sample size by collecting gene expression data across studies. Motivated by the traditional meta-analysis using random effects models, we present a hierarchical random covariance model and use it for the meta-analysis of gene correlation networks across 11 large-scale gene expression studies of diffuse large B-cell lymphoma (DLBCL). We suggest to use a maximum likelihood estimator for the underlying common covariance matrix and introduce an EM algorithm for estimation. By simulation experiments comparing the estimated covariance matrices by cophenetic correlation and Kullback–Leibler divergence the suggested estimator showed to perform better or not worse than a simple pooled estimator. In a posthoc analysis of the estimated common covariance matrix for the DLBCL data we were able to identify novel biologically meaningful gene correlation networks with eigengenes of prognostic value. In conclusion, the method seems to provide a generally applicable framework for meta-analysis, when multiple features are measured and believed to share a common covariance matrix obscured by study dependent noise.

Article information

Ann. Appl. Stat., Volume 12, Number 3 (2018), 1894-1913.

Received: June 2016
Revised: June 2017
First available in Project Euclid: 11 September 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Covariance estimation precision estimation integrative analysis meta-analysis network analysis


Bilgrau, Anders Ellern; Brøndum, Rasmus Froberg; Eriksen, Poul Svante; Dybkær, Karen; Bøgsted, Martin. Estimating a common covariance matrix for network meta-analysis of gene expression datasets in diffuse large B-cell lymphoma. Ann. Appl. Stat. 12 (2018), no. 3, 1894--1913. doi:10.1214/18-AOAS1136.

Export citation


  • Agnelli, L., Forcato, M., Ferrari, F., Tuana, G., Todoerti, K., Walker, B. A., Morgan, G. J., Lombardi, L., Bicciato, S. and Neri, A. (2011). The reconstruction of transcriptional networks reveals critical genes with implications for clinical outcome of multiple myeloma. Clin. Cancer Res. 17 7402–7412.
  • Bilgrau, A. E. (2014). correlateR: Fast, efficient, and robust partial correlations. R package version 0.1. Available at
  • Bilgrau, A. E., Brøndum, R. F., Eriksen, P. S., Dybkær, K. and Bøgsted, M. (2018). Supplement to “Estimating a common covariance matrix for network meta-analysis of gene expression datasets in diffuse large B-cell lymphoma.” DOI:10.1214/18-AOAS1136SUPPA.
  • Borenstein, M., Hedges, L. V., Higgins, J. P. and Rothstein, H. R. (2010). A basic introduction to fixed-effect and random-effects models for meta-analysis. Res. Synth. Methods 1 97–111.
  • Cheng, P., Corzo, C. A., Luetteke, N., Yu, B., Nagaraj, S., Bui, M. M., Ortiz, M., Nacken, W., Sorg, C., Vogl, T. et al. (2008). Inhibition of dendritic cell differentiation and accumulation of myeloid-derived suppressor cells in cancer is regulated by S100A9 protein. J. Exp. Med. 205 2235–2249.
  • Choi, J. K., Yu, U., Kim, S. and Yoo, O. J. (2003). Combining multiple microarray studies and modeling interstudy variation. Bioinformatics 19 i84–i90.
  • Clarke, C., Madden, S. F., Doolan, P., Aherne, S. T., Joyce, H., O’Driscoll, L., Gallagher, W. M., Hennessy, B. T., Moriarty, M., Crown, J., Kennedy, S. and Clynes, M. (2013). Correlating transcriptional networks to breast cancer survival: A large-scale coexpression analysis. Carcinogenesis 34 2300–2308.
  • Compagno, M., Lim, W. K., Grunn, A., Nandula, S. V., Brahmachary, M., Shen, Q., Bertoni, F., Ponzoni, M., Scandurra, M., Califano, A. et al. (2009). Mutations of multiple genes cause deregulation of NF-$\kappa $B in diffuse large B-cell lymphoma. Nature 459 717–721.
  • Dai, M., Wang, P., Boyd, A. D., Kostov, G., Athey, B., Jones, E. G., Bunney, W. E., Myers, R. M., Speed, T. P., Akil, H., Watson, S. J. and Meng, F. (2005). Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 33 e175.
  • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1–38.
  • DerSimonian, R. and Laird, N. (1986). Meta-analysis in clinical trials. Control. Clin. Trials 7 177–188.
  • Dybkær, K., Bøgsted, M., Falgreen, S., Bødker, J. S., Kjeldsen, M. K., Schmitz, A., Bilgrau, A. E., Xu-Monette, Z. Y., Li, L., Bergkvist, K. S., Laursen, M. B., Rodrigo-Domingo, M., Marques, S. C., Rasmussen, S. B., Nyegaard, M., Gaihede, M., Møller, M. B., Samworth, R. J., Shah, R. D., Johansen, P., El-Galaly, T. C., Young, K. H. and Johnsen, H. E. (2015). A diffuse large B-cell lymphoma classification system that associates normal B-cell subset phenotypes with prognosis. J. Clin. Oncol. 33 1379–1388.
  • Eddelbuettel, D. and François, R. (2011). Rcpp: Seamless R and C++ integration. J. Stat. Softw. 40 1–18.
  • François, R., Eddelbuettel, D. and Bates, D. (2012). RcppArmadillo: Rcpp integration for Armadillo templated linear algebra library. R package version
  • Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
  • Fulmer, T. (2008). Suppressing the suppressors. SciBX 1(38). DOI:10.1038/scibx.2008.914.
  • Galili, T. (2015). dendextend: An R package for visualizing, adjusting and comparing trees of hierarchical clustering. Bioinformatics 31 3718–3720.
  • Gautier, L., Cope, L., Bolstad, B. M. and Irizarry, R. A. (2004). affy—Analysis of affymetrix GeneChip data at the probe level. Bioinformatics 20 307–315.
  • Horvath, S. (2011). Weighted Network Analysis: Applications in Genomics and Systems Biology. Springer, Berlin.
  • Hummel, M., Bentink, S., Berger, H., Klapper, W., Wessendorf, S., Barth, T. F., Bernd, H.-W., Cogliatti, S. B., Dierlamm, J., Feller, A. C. et al. (2006). A biologic definition of Burkitt’s lymphoma from transcriptional and genomic profiling. N. Engl. J. Med. 354 2419–2430.
  • International Lymphoma Study Group (1997). A clinical evaluation of the international lymphoma study group classification of non-Hodgkin’s lymphoma. Blood 89 3909–3918.
  • Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U. and Speed, T. P. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4 249–264.
  • Jima, D. D., Zhang, J., Jacobs, C., Richards, K. L., Dunphy, C. H., Choi, W. W., Au, W. Y., Srivastava, G., Czader, M. B., Rizzieri, D. A. et al. (2010). Deep sequencing of the small RNA transcriptome of normal and malignant human B cells identifies hundreds of novel microRNAs. Blood 116 e118–e127.
  • Johnson, W. E., Li, C. and Rabinovic, A. (2007). Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8 118–127.
  • Lee, J. A., Dobbin, K. K. and Ahn, J. (2014). Covariance adjustment for batch effect in gene expression data. Stat. Med. 33 2681–2695.
  • Lenz, G., Wright, G. W., Emre, N. T., Kohlhammer, H., Dave, S. S., Davis, R. E., Carty, S., Lam, L. T., Shaffer, A., Xiao, W. et al. (2008). Molecular subtypes of diffuse large B-cell lymphoma arise by distinct genetic pathways. Proc. Natl. Acad. Sci. USA 105 13520–13525.
  • Mattiussi, V., Tumminello, M., Iori, G. and Mantegna, R. N. (2011). Comparing correlation matrix estimators via Kullback–Leibler divergence. Preprint, DOI:10.2139/ssrn.1966714.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • Monti, S., Chapuy, B., Takeyama, K., Rodig, S. J., Hao, Y., Yeda, K. T., Inguilizian, H., Mermel, C., Currie, T., Dogan, A. et al. (2012). Integrative analysis reveals an outcome-associated and targetable pattern of p53 and cell cycle deregulation in diffuse large B cell lymphoma. Cancer Cell 22 359–372.
  • Phipson, B. and Smyth, G. K. (2010). Permutation $p$-values should never be zero: Calculating exact $p$-values when permutations are randomly drawn. Stat. Appl. Genet. Mol. Biol. 9 Art. 39, 14.
  • R Core Team (2012). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.
  • Reimand, J., Kolde, R. and Arak, T. (2016). gProfileR: Interface to the ‘g:Profiler’ toolkit. R package version 0.6.1.
  • Reimand, J., Arak, T., Adler, P., Kolberg, L., Reisberg, S., Peterson, H. and Vilo, J. (2016). g:Profiler—A web server for functional interpretation of gene lists (2016 update). Nucleic Acids Res. 44 W83–W89.
  • Salaverria, I., Philipp, C., Oschlies, I., Kohler, C. W., Kreuz, M., Szczepanowski, M., Burkhardt, B., Trautmann, H., Gesk, S., Andrusiewicz, M. et al. (2011). Translocations activating IRF4 identify a subtype of germinal center-derived B-cell lymphoma affecting predominantly children and young adults. Blood 118 139–147.
  • Shrout, P. E. and Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychol. Bull. 86 420–428.
  • Sokal, R. R. and Rohlf, F. J. (1962). The comparison of dendrograms by objective methods. Taxon 11 33–40.
  • Stroncek, D. F., Butterfield, L. H., Cannarile, M. A., Dhodapkar, M. V., Greten, T. F., Grivel, J. C., Kaufman, D. R., Kong, H. H., Korangy, F., Lee, P. P., Marincola, F., Rutella, S., Siebert, J. C., Trinchieri, G. and Seliger, B. (2017). Systematic evaluation of immune regulation and modulation. J. Immunother. Cancer 5 21.
  • van Wieringen, W. N. and Peeters, C. F. W. (2016). Ridge estimation of inverse covariance matrices from high-dimensional data. Comput. Statist. Data Anal. 103 284–303.
  • Visco, C., Li, Y., Xu-Monette, Z. Y., Miranda, R. N., Green, T. M., Tzankov, A., Wen, W., Liu, W., Kahl, B., d’Amore, E. et al. (2012). Comprehensive gene expression profiling and immunohistochemical studies support application of immunophenotypic algorithm for molecular subtype classification in diffuse large B-cell lymphoma: A report from the international DLBCL Rituximab-CHOP consortium program study. Leukemia 26 2103–2113.
  • Williams, P. M., Li, R., Johnson, N. A., Wright, G., Heath, J.-D. and Gascoyne, R. D. (2010). A novel method of amplification of FFPET-derived RNA enables accurate disease classification with microarrays. J. Mol. Diagnostics 12 680–686.
  • Xie, Y. (2013). Dynamic Documents with R and Knitr. CRC Press, Boca Raton, FL.

Supplemental materials

  • Supplement A: Appendices. Supplementary figures, tables and proofs available online.
  • Supplement B: Documents for reproducibility. The documents and other needed files to perform the analyses to reproduce this article. See the README file herein.