The Annals of Applied Statistics

Gene network reconstruction using global-local shrinkage priors

Gwenaël G. R. Leday, Mathisca C. M. de Gunst, Gino B. Kpogbezan, Aad W. van der Vaart, Wessel N. van Wieringen, and Mark A. van de Wiel

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

Reconstructing a gene network from high-throughput molecular data is an important but challenging task, as the number of parameters to estimate easily is much larger than the sample size. A conventional remedy is to regularize or penalize the model likelihood. In network models, this is often done locally in the neighborhood of each node or gene. However, estimation of the many regularization parameters is often difficult and can result in large statistical uncertainties. In this paper we propose to combine local regularization with global shrinkage of the regularization parameters to borrow strength between genes and improve inference. We employ a simple Bayesian model with nonsparse, conjugate priors to facilitate the use of fast variational approximations to posteriors. We discuss empirical Bayes estimation of hyperparameters of the priors, and propose a novel approach to rank-based posterior thresholding. Using extensive model- and data-based simulations, we demonstrate that the proposed inference strategy outperforms popular (sparse) methods, yields more stable edges, and is more reproducible. The proposed method, termed ShrinkNet, is then applied to Glioblastoma to investigate the interactions between genes associated with patient survival.

Article information

Source
Ann. Appl. Stat., Volume 11, Number 1 (2017), 41-68.

Dates
Received: February 2016
Revised: June 2016
First available in Project Euclid: 8 April 2017

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1491616871

Digital Object Identifier
doi:10.1214/16-AOAS990

Mathematical Reviews number (MathSciNet)
MR3634314

Zentralblatt MATH identifier
1366.62227

Keywords
Undirected gene network Bayesian inference shrinkage variational approximation empirical Bayes

Citation

Leday, Gwenaël G. R.; de Gunst, Mathisca C. M.; Kpogbezan, Gino B.; van der Vaart, Aad W.; van Wieringen, Wessel N.; van de Wiel, Mark A. Gene network reconstruction using global-local shrinkage priors. Ann. Appl. Stat. 11 (2017), no. 1, 41--68. doi:10.1214/16-AOAS990. https://projecteuclid.org/euclid.aoas/1491616871


Export citation

References

  • Allen, G. I. and Liu, Z. (2013). A local Poisson graphical model for inferring networks from sequencing data. IEEE Trans. Nanobiosci. 12 189–198.
  • Blei, D. M. and Jordan, M. I. (2006). Variational inference for Dirichlet process mixtures. Bayesian Anal. 1 121–143 (electronic).
  • Bondell, H. D. and Reich, B. J. (2012). Consistent high-dimensional Bayesian variable selection via penalized credible regions. J. Amer. Statist. Assoc. 107 1610–1624.
  • Braun, M. and McAuliffe, J. (2010). Variational inference for large-scale models of discrete choice. J. Amer. Statist. Assoc. 105 324–335.
  • Camby, I., Mercier, M. L., Lefranc, F. and Kiss, R. (2006). Galectin-1: A small protein with major functions. Glycobiology 16 137R–157R.
  • Cerami, E. G., Gross, B. E., Demir, E., Rodchenkov, I., Babur, Ö., Anwar, N., Schultz, N., Bader, G. D. and Sander, C. (2011). Pathway commons, a web resource for biological pathway data. Nucleic Acids Res. 39 D685–D690.
  • Cerami, E., Gao, J., Dogrusoz, U., Gross, B. E., Sumer, S. O., Aksoy, B. A., Jacobsen, A., Byrne, C. J., Heuer, M. L., Larsson, E., Antipin, Y., Reva, B., Goldberg, A. P., Sander, C. and Schultz, N. (2012). The cBio cancer genomics portal: An open platform for exploring multidimensional cancer genomics data. Cancer Discovery 2 401–404.
  • Chen, J. and Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95 759–771.
  • Chen, S., Witten, D. M. and Shojaie, A. (2015). Selection and estimation for mixed graphical models. Biometrika 102 47–64.
  • Cordes, C., Bartling, B., Simm, A., Afar, D., Lautenschlager, C., Hansen, G., Silber, R.-E., Burdach, S. and Hofmann, H.-S. (2009). Simultaneous expression of Cathepsins B and K in pulmonary adenocarcinomas and squamous cell carcinomas predicts poor recurrence-free and overall survival. Lung Cancer 64 79–85.
  • Dobra, A., Lenkoski, A. and Rodriguez, A. (2011). Bayesian inference for general Gaussian graphical models with application to multivariate lattice data. J. Amer. Statist. Assoc. 106 1418–1433.
  • Dobra, A., Hans, C., Jones, B., Nevins, J. R., Yao, G. and West, M. (2004). Sparse graphical models for exploring gene expression data. J. Multivariate Anal. 90 196–212.
  • Dodd, L. E. and Pepe, M. S. (2003). Partial AUC estimation and regression. Biometrics 59 614–623.
  • Efron, B. (2010). Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Institute of Mathematical Statistics (IMS) Monographs 1. Cambridge Univ. Press, Cambridge.
  • Fortin, S., Mercier, M. L., Camby, I., Spiegl-Kreinecker, S., Berger, W., Lefranc, F. and Kiss, R. (2010). Galectin-1 is implicated in the protein kinase C epsilon/vimentin-controlled trafficking of integrin-beta1 in glioblastoma cells. Brain Pathol. 20 39–49.
  • Foygel, R. and Drton, M. (2010). Extended Bayesian information criteria for Gaussian graphical models. In Advances in Neural Information Processing Systems 23 (J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel and A. Culotta, eds.) 604–612.
  • Friedman, J. H., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
  • Gao, X., Pu, D. Q., Wu, Y. and Xu, H. (2012). Tuning parameter selection for penalized likelihood estimation of Gaussian graphical model. Statist. Sinica 22 1123–1146.
  • Geiger, D. and Heckerman, D. (2002). Parameter priors for directed acyclic graphical models and the characterization of several probability distributions. Ann. Statist. 30 1412–1440.
  • Giraud, C. (2008). Estimation of Gaussian graphs by model selection. Electron. J. Stat. 2 542–563.
  • Gole, B., Huszthy, P. C., Popović, M., Jeruc, J., Ardebili, Y. S., Bjerkvig, R. and Lah, T. T. (2012). The regulation of cysteine cathepsins and cystatins in human gliomas. Int. J. Cancer 131 1779–1789.
  • Ishwaran, H. and Rao, J. S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. Ann. Statist. 33 730–773.
  • Jacobsen, A. (2013). cgdsr: R-Based API for accessing the MSKCC Cancer Genomics Data Server (CGDS). R package version 1.1.30.
  • Kallunki, T., Olsen, O. D. and Jäättelä, M. (2013). Cancer-associated lysosomal changes: Friends or foes? Oncogene 32 1995–2004.
  • Kass, R. E. and Wasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. J. Amer. Statist. Assoc. 90 928–934.
  • Krämer, N., Schäfer, J. and Boulesteix, A.-L. (2009). Regularized estimation of large-scale gene association networks using graphical Gaussian models. BMC Bioinformatics 10 384.
  • Leday, G. G. R., de Gunst, M. C. M., Kpogbezan, G. B., van der Vaart, A. W., van Wieringen, W. N., and van de Wiel, M. A. (2017). Supplement to “Gene network reconstruction using global-local shrinkage priors.” DOI:10.1214/16-AOAS990SUPP.
  • Ledoit, O. and Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal. 88 365–411.
  • Lewis, C. A., Brault, C., Peck, B., Bensaad, K., Griffiths, B., Mitter, R., Chakravarty, P., East, P., Dankworth, B., Alibhai, D. et al. (2015). SREBP maintains lipid biosynthesis and viability of cancer cells under lipid-and oxygen-deprived conditions and defines a gene signature associated with poor survival in glioblastoma multiforme. Oncogene.
  • Lian, H. (2011). Shrinkage tuning parameter selection in precision matrices estimation. J. Statist. Plann. Inference 141 2839–2848.
  • Liang, F., Paulo, R., Molina, G., Clyde, M. A. and Berger, J. O. (2008). Mixtures of $g$ priors for Bayesian variable selection. J. Amer. Statist. Assoc. 103 410–423.
  • Lim, K. S., Lim, K. J., Price, A. C., Orr, B. A., Eberhart, C. G. and Bar, E. E. (2013). Inhibition of monocarboxylate transporter-4 depletes stem-like glioblastoma cells and inhibits HIF transcriptional response in a lactate-independent manner. Oncogene.
  • Luo, S., Song, R. and Witten, D. (2014). Sure screening for Gaussian graphical models. Preprint. Available at arXiv:1407.7819.
  • Madhankumar, A. B., Slagle-Webb, B., Mintz, A., Sheehan, J. M. and Connor, J. R. (2006). Interleukin-13 receptor-targeted nanovesicles are a potential therapy for glioblastoma multiforme. Mol. Cancer Ther. 5 3162–3169.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 417–473.
  • Mohammadi, A. and Wit, E. C. (2015). Bayesian structure learning in sparse Gaussian graphical models. Bayesian Anal. 10 109–138.
  • Oates, C. J. and Mukherjee, S. (2012). Network inference and biological dynamics. Ann. Appl. Stat. 6 1209–1235.
  • Ormerod, J. T. and Wand, M. P. (2010). Explaining variational approximations. Amer. Statist. 64 140–153.
  • Park, T. and Casella, G. (2008). The Bayesian lasso. J. Amer. Statist. Assoc. 103 681–686.
  • Peng, J., Zhou, N. and Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. J. Amer. Statist. Assoc. 104 735–746.
  • Porstmann, T., Santos, C. R., Griffiths, B., Cully, M., Wu, M., Leevers, S., Griffiths, J. R., Chung, Y.-L. and Schulze, A. (2008). SREBP activity is regulated by mTORC1 and contributes to akt-dependent cell growth. Cell Metabolism 8 224–236.
  • Rajagopalan, M. and Broemeling, L. (1983). Bayesian inference for the variance components in general mixed linear models. Comm. Statist. Theory Methods 12 701–723.
  • Ravikumar, P., Wainwright, M. J. and Lafferty, J. D. (2010). High-dimensional Ising model selection using $\ell_{1}$-regularized logistic regression. Ann. Statist. 38 1287–1319.
  • Rue, H., Martino, S. and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. B. Stat. Methodol. 71 319–392.
  • Schaefer, J., Opgen-Rhein, R. and Strimmer, K. (2006). Reverse engineering genetic networks using the GeneNet package. R News 6/5 50–53.
  • Schäfer, J. and Strimmer, K. (2005a). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat. Appl. Genet. Mol. Biol. 4 Art. 32, 28 pp. (electronic).
  • Schäfer, J. and Strimmer, K. (2005b). An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics 21 754–764.
  • Scutari, M. (2013). On the prior and posterior distributions used in graphical modelling. Bayesian Anal. 8 505–532.
  • Valpola, H. and Honkela, A. (2006). Hyperparameter adaptation in variational Bayes for the gamma distribution. Technical report, Helsinki University of Technology.
  • Van Wieringen, W. N. and Peeters, C. F. W. (2016). Ridge estimation of inverse covariance matrices from high-dimensional data. Comput. Statist. Data Anal. 103 284–303.
  • Van de Wiel, M. A., Leday, G. G. R., Pardo, L., Rue, H., Van der Vaart, A. W. and Van Wieringen, W. N. (2013). Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. Biostatistics 14 113–128.
  • Wang, H. and Li, S. Z. (2012). Efficient Gaussian graphical model determination under $G$-Wishart prior distributions. Electron. J. Stat. 6 168–198.
  • Warton, D. I. (2008). Penalized normal likelihood and ridge regularization of correlation and covariance matrices. J. Amer. Statist. Assoc. 103 340–349.
  • West, M. (2003). Bayesian factor regression models in the “large $p$, small $n$” paradigm. In Bayesian Statistics, 7 (Tenerife, 2002) 733–742. Oxford Univ. Press, New York.
  • Yajima, M., Telesca, D., Ji, Y. and Muller, P. (2012). Differential patterns of interaction and Gaussian graphical models. COBRA Preprint Series 91.
  • Yang, E., Ravikumar, P., Allen, G. and Liu, Z. (2012). Graphical models via generalized linear models. In Advances in Neural Information Processing Systems 25 (P. Bartlett, F. C. N. Pereira, C. J. C. Burges, L. Bottou and K. Q. Weinberger, eds.) 1367–1375.
  • Yuan, M. (2010). High dimensional inverse covariance matrix estimation via linear programming. J. Mach. Learn. Res. 11 2261–2286.
  • Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19–35.
  • Yuan, Y., Curtis, C., Caldas, C. and Markowetz, F. (2012). A sparse regulatory network of copy-number driven gene expression reveals putative breast cancer oncogenes. IEEE/ACM Trans Comput Biol Bioinform 9 947–954.
  • Zellner, A. and Siow, A. (1980). Posterior odds ratios for selected regression hypotheses. In Bayesian Statistics: Proceedings of the First International Meeting Held in Valencia (Spain), Vol. 1 (J. M. Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith, eds.). University Press, Valencia.
  • Zhao, T., Liu, H., Roeder, K., Lafferty, J. and Wasserman, L. (2012). The $\tt{huge}$ package for high-dimensional undirected graph estimation in $\tt{R}$. J. Mach. Learn. Res. 13 1059–1062.
  • Zhou, S., Rütimann, P., Xu, M. and Bühlmann, P. (2011). High-dimensional covariance estimation based on Gaussian graphical models. J. Mach. Learn. Res. 12 2975–3026.

Supplemental materials

  • Technical details and complementary results. We present technical and methodological details regarding the variational approximation and the different methods under comparison in Sections 3 and 4. Furthermore, complementary simulation results are provided.