The Annals of Applied Statistics

Bayesian latent hierarchical model for transcriptomic meta-analysis to detect biomarkers with clustered meta-patterns of differential expression signals

Zhiguang Huo, Chi Song, and George Tseng

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Due to the rapid development of high-throughput experimental techniques and fast-dropping prices, many transcriptomic datasets have been generated and accumulated in the public domain. Meta-analysis combining multiple transcriptomic studies can increase the statistical power to detect disease-related biomarkers. In this paper we introduce a Bayesian latent hierarchical model to perform transcriptomic meta-analysis. This method is capable of detecting genes that are differentially expressed (DE) in only a subset of the combined studies, and the latent variables help quantify homogeneous and heterogeneous differential expression signals across studies. A tight clustering algorithm is applied to detected biomarkers to capture differential meta-patterns that are informative to guide further biological investigation. Simulations and three examples, including a microarray dataset from metabolism-related knockout mice, an RNA-seq dataset from HIV transgenic rats and cross-platform datasets from human breast cancer are used to demonstrate the performance of the proposed method.

Article information

Ann. Appl. Stat., Volume 13, Number 1 (2019), 340-366.

Received: September 2016
Revised: February 2018
First available in Project Euclid: 10 April 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Transcriptomic differential analysis meta-analysis Bayesian hierarchical model Dirichlet process


Huo, Zhiguang; Song, Chi; Tseng, George. Bayesian latent hierarchical model for transcriptomic meta-analysis to detect biomarkers with clustered meta-patterns of differential expression signals. Ann. Appl. Stat. 13 (2019), no. 1, 340--366. doi:10.1214/18-AOAS1188.

Export citation


  • Anders, S. and Huber, W. (2010). Differential expression analysis for sequence count data. Genome Biol. 11 Art. ID R106.
  • Benjamini, Y. and Heller, R. (2008). Screening for partial conjunction hypotheses. Biometrics 64 1215–1222.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
  • Berger, J. O. (2013). Statistical Decision Theory and Bayesian Analysis, Springer, New York.
  • Bhattacharjee, S., Rajaraman, P., Jacobs, K. B., Wheeler, W. A., Melin, B. S., Hartge, P., Yeager, M., Chung, C. C., Chanock, S. J., Chatterjee, N. et al. (2012). A subset-based approach improves power and interpretation for the combined analysis of genetic association studies of heterogeneous traits. Am. J. Hum. Genet. 90 821–835.
  • Birnbaum, A. (1954). Combining independent tests of significance. J. Amer. Statist. Assoc. 49 559–574.
  • Chang, L.-C., Lin, H.-M., Sibille, E. and Tseng, G. C. (2013). Meta-analysis methods for combining multiple expression profiles: Comparisons, statistical characterization and an application guideline. BMC Bioinform. 14 Art. ID 368.
  • Cooper, H., Hedges, L. V. and Valentine, J. C. (2009). The Handbook of Research Synthesis and Meta-Analysis. Russell Sage Foundation, New York.
  • Domany, E. (2014). Using high-throughput transcriptomic data for prognosis: A critical overview and perspectives. Cancer Res. 74 4612–4621.
  • Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96–104.
  • Efron, B. (2008). Microarrays, empirical Bayes and the two-groups model. Statist. Sci. 23 1–22.
  • Efron, B. (2009). Empirical Bayes estimates for large-scale prediction problems. J. Amer. Statist. Assoc. 104 1015–1028.
  • Efron, B. and Tibshirani, R. (2002). Empirical Bayes methods and false discovery rates for microarrays. Genet. Epidemiol. 23 70–86.
  • Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
  • Escobar, M. D. and West, M. (1995). Bayesian density estimation and inference using mixtures. J. Amer. Statist. Assoc. 90 577–588.
  • Fisher, R. A. (1934). Statistical Methods for Research Workers. Hafner Publishing Co., New York.
  • Gentleman, R., Carey, V. J., Huber, W., Irizarry, R. A. and Dudoit, S., eds. (2006). Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, New York.
  • Ghosal, S., Ghosh, J. K. and Ramamoorthi, R. V. (1999). Posterior consistency of Dirichlet mixtures in density estimation. Ann. Statist. 27 143–158.
  • Huo, Z., Ding, Y., Liu, S., Oesterreich, S. and Tseng, G. (2016). Meta-analytic framework for sparse $K$-means to identify disease subtypes in multiple transcriptomic studies. J. Amer. Statist. Assoc. 111 27–42.
  • Huo, Z., Song, C. and Tseng, G. (2019). Supplement to “Bayesian latent hierarchical model for transcriptomic meta-analysis to detect biomarkers with clustered meta-patterns of differential expression signals.” DOI:10.1214/18-AOAS1188SUPPA, DOI:10.1214/18-AOAS1188SUPPB, DOI:10.1214/18-AOAS1188SUPPC.
  • Jacob, L., Gagnon-Bartsch, J. A. and Speed, T. P. (2016). Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed. Biostatistics 17 16–28.
  • Johnson, W. E., Li, C. and Rabinovic, A. (2007). Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8 118–127.
  • Kang, H. M., Ye, C. and Eskin, E. (2008). Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots. Genetics 180 1909–1925.
  • Leek, J. T. and Storey, J. D. (2007). Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3 Art. ID e161.
  • Li, J. and Tseng, G. C. (2011). An adaptively weighted statistic for detecting differential gene expression when combining multiple transcriptomic studies. Ann. Appl. Stat. 5 994–1019.
  • Li, M. D., Cao, J., Wang, S., Wang, J., Sarkar, S., Vigorito, M., Ma, J. Z. and Chang, S. L. (2013). Transcriptome sequencing of gene expression in the brain of the HIV-1 transgenic rat. PLoS ONE 8 Art. ID e59582.
  • Li, Q., Wang, S., Huang, C.-C., Yu, M. and Shao, J. (2014). Meta-analysis based variable selection for gene expression data. Biometrics 70 872–880.
  • Listgarten, J., Kadie, C., Schadt, E. E. and Heckerman, D. (2010). Correction for hidden confounders in the genetic analysis of gene expression. Proc. Natl. Acad. Sci. USA 107 16465–16470.
  • Littell, R. C. and Folks, J. L. (1971). Asymptotic optimality of Fisher’s method of combining independent tests. J. Amer. Statist. Assoc. 66 802–806.
  • Müller, P. and Quintana, F. A. (2004). Nonparametric Bayesian data analysis. Statist. Sci. 19 95–110.
  • Muralidharan, O. (2010). An empirical Bayes mixture method for effect size and false discovery rate estimation. Ann. Appl. Stat. 4 422–438.
  • Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Statist. 9 249–265.
  • Newton, M. A., Noueiry, A., Sarkar, D. and Ahlquist, P. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5 155–176.
  • Ohio Supercomputer Center (1987). Ohio Supercomputer Center. Available at
  • Quinlan, A. R. and Hall, I. M. (2010). BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 26 841–842.
  • Ramasamy, A., Mondry, A., Holmes, C. C. and Altman, D. G. (2008). Key issues in conducting a meta-analysis of gene expression microarray datasets. PLoS Med. 5 Art. ID e184.
  • Robinson, M. D., McCarthy, D. J. and Smyth, G. K. (2010). edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26 139–140.
  • Scharpf, R. B., Tjelmeland, H., Parmigiani, G. and Nobel, A. B. (2009). A Bayesian model for cross-study differential gene expression. J. Amer. Statist. Assoc. 104 1295–1310.
  • Simon, R. (2005). Development and validation of therapeutically relevant multi-gene biomarker classifiers. J. Natl. Cancer Inst. 97 866–867.
  • Simon, R., Radmacher, M. D., Dobbin, K. and McShane, L. M. (2003). Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J. Natl. Cancer Inst. 95 14–18.
  • Smyth, G. K. (2005). Limma: Linear models for microarray data. In Bioinformatics and Computational Biology Solutions Using R and Bioconductor 397–420. Springer, New York.
  • Song, C. and Tseng, G. C. (2014). Hypothesis setting and order statistic for robust genomic meta-analysis. Ann. Appl. Stat. 8 777–800.
  • Stouffer, S. A., Suchman, E. A., Devinney, L. C., Star, S. A. and Williams Jr., R. M. (1949). The American Soldier: Adjustment During Army Life. Princeton Univ. Press, Princeton, NJ.
  • Trapnell, C., Pachter, L. and Salzberg, S. L. (2009). TopHat: Discovering splice junctions with RNA-Seq. Bioinformatics 25 1105–1111.
  • Tseng, G. C., Ghosh, D. and Feingold, E. (2012). Comprehensive literature review and statistical considerations for microarray meta-analysis. Nucleic Acids Res. 40 3785–3799.
  • Tseng, G. C. and Wong, W. H. (2005). Tight clustering: A resampling-based approach for identifying stable and tight patterns in data. Biometrics 61 10–16.
  • Tusher, V. G., Tibshirani, R. and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA 98 5116.
  • Walker, W. L., Liao, I. H., Gilbert, D. L., Wong, B., Pollard, K. S., McCulloch, C. E., Lit, L. and Sharp, F. R. (2008). Empirical Bayes accomodation of batch-effects in microarray data using identical replicate reference samples: Application to RNA expression profiling of blood from Duchenne muscular dystrophy patients. BMC Genomics 9 Art. ID 494.
  • Weiss, R. A. (1993). How does HIV cause AIDS? Science 260 1273–1279.
  • Zhao, Y., Kang, J. and Yu, T. (2014). A Bayesian nonparametric mixture model for selecting genes and gene subnetworks. Ann. Appl. Stat. 8 999–1021.

Supplemental materials

  • Supplementary information. Additional tables, figures, and text.
  • Supplementary Excel file 1. Pathway information for the mouse metabolism application.
  • Supplementary Excel file 2. Pathway information for the HIV transgenic rat application.