The Annals of Applied Statistics

Doubly stochastic continuous-time hidden Markov approach for analyzing genome tiling arrays

W. Evan Johnson, X. Shirley Liu, and Jun S. Liu
Source: Ann. Appl. Stat. Volume 3, Number 3 (2009), 1183-1203.

Abstract

Microarrays have been developed that tile the entire nonrepetitive genomes of many different organisms, allowing for the unbiased mapping of active transcription regions or protein binding sites across the entire genome. These tiling array experiments produce massive correlated data sets that have many experimental artifacts, presenting many challenges to researchers that require innovative analysis methods and efficient computational algorithms. This paper presents a doubly stochastic latent variable analysis method for transcript discovery and protein binding region localization using tiling array data. This model is unique in that it considers actual genomic distance between probes. Additionally, the model is designed to be robust to cross-hybridized and nonresponsive probes, which can often lead to false-positive results in microarray experiments. We apply our model to a transcript finding data set to illustrate the consistency of our method. Additionally, we apply our method to a spike-in experiment that can be used as a benchmark data set for researchers interested in developing and comparing future tiling array methods. The results indicate that our method is very powerful, accurate and can be used on a single sample and without control experiments, thus defraying some of the overhead cost of conducting experiments on tiling arrays.

First Page: Show Hide

Related Works:

Full-text: Access denied (no subscription detected)
In 2007, access to the Annals of Applied Statistics was open. Beginning in 2008, you must hold a subscription or be a member of the IMS to view the full journal. For more information on subscribing, please visit: http://imstat.org/orders.
If you are already an IMS member, you may need to update your Euclid profile following the instructions here: http://imstat.org/publications/eaccess.htm.
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aoas/1254773284
Digital Object Identifier: doi:10.1214/09-AOAS248
Zentralblatt MATH identifier: 05758457
Mathematical Reviews number (MathSciNet): MR2750392

References

Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger, E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A. J., Wheeler, R., Wong, B., Drenkow, J., Yamanaka, M., Patel, S., Brubaker, S., Tammana, H., Helt, G., Struhl, K. and Gingeras, T. R. (2004). Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116 499–509.
Cox, D. R. (1955). The analysis of non-Markovian stochastic processes by the inclusion of supplementary variables. Proc. Camb. Phil. Soc. 51 433–441.
Mathematical Reviews (MathSciNet): MR70093
Digital Object Identifier: doi:10.1017/S0305004100030437
David, L., Huber, W., Granovskaia, M., Toedling, J., Palm, C. J., Bofkin, L., Jones, T., Davis, R. W. and Steinmetz, L. M. (2006). A high-resolution map of transcription in the yeast genome. Proc. Natl. Acad. Sci. USA 103 5320–5325.
Gottardo, R., Li, W., Johnson, W. E. and Liu, X. S. (2008). A flexible and powerful Bayesian hierarchical model for ChIP-chip experiments. Biometrics 64 468–478.
Mathematical Reviews (MathSciNet): MR2432417
Digital Object Identifier: doi:10.1111/j.1541-0420.2007.00899.x
Huber, W., Toedling, J. and Steinmetz, L. (2006). Transcript mapping with high-density oligonu- cleotide tiling arrays. Bioinformatics 22 1963–1970.
Zentralblatt MATH: 1142.62100
Ji, H., Jiang, H., Ma, W., Johnson, D. S., Myers, R. M. and Wong, W. H. (2008). An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nature Biotechnology 26 1293–1300.
Johnson, W. E., Li, W., Meyer, C. A., Gottardo, R., Carroll, J. S., Brown, M. and Liu, X. S. (2006). Model-based analysis of tiling-arrays for ChIP-chip. Proc. Natl. Acad. Sci. USA 103 12457–12462.
Johnson, D. S., Li, W., Gordon, D. B., Bhattacharjee, A., Curry, B., Ghosh, J., Brizuela, L., Carroll, J. S., Brown, M., Flicek, P., Koch, C. M., Dunham, I., Bieda, M., Xu, X., Farnham, P. J., Kapranov, P., Nix, D. A., Gingeras, T. R., Zhang, X., Holster, H., Jiang, N., Green, R. D., Song, J. S., McCuine, S. A., Anton, E., Nguyen, L., Trinklein, N. D., Ye, Z., Ching, K., Hawkins, D., Ren, B., Scacheri, P. C., Rozowsky, J., Karpikov, A., Euskirchen, G., Weissman, S., Gerstein, M., Snyder, M., Yang, A., Moqtaderi, Z., Hirsch, H., Shulha, H. P., Fu, Y., Weng, Z., Struhl, K., Myers, R. M., Lieb, J. D. and Liu, X. S. (2008). Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets. Genome Res. 18 393–403.
Johnson W. E., Liu, X. S. and Liu, J. S. (2009). Supplement to “Doubly-stochastic continuous-time hidden Markov approach for analyzing genome tiling arrays.” DOI: 10.1214/09-AOAS248SUPP.
Kapranov, P., Cawley, S. E., Drenkow, J., Bekiranov, S., Strausberg, R. L., Fodor, S. P. and Gingeras, T. R. (2002). Large-scale transcriptional activity in chromosomes 21 and 22. Science 296 916–919.
Keles, S. (2006). Mixture modeling for genome-wide localization of transcription factors. Biometrics 63 10–21.
Mathematical Reviews (MathSciNet): MR2345570
Digital Object Identifier: doi:10.1111/j.1541-0420.2005.00659.x
Li, W., Carroll, J., Brown, M. and Liu, X. (2008). xMAN: Extreme MApping of OligoNu-cleotides. BMC Bioinformatics 9 (Suppl 1) S20.
Li, W., Meyer, C. A. and Liu, X. S. (2005). A hidden Markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding sequences. Bioinformatics 21 (Suppl 1) i274–i282.
Zentralblatt MATH: 1022.68519
Meng, X. L. and Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80 267–278.
Mathematical Reviews (MathSciNet): MR1243503
Zentralblatt MATH: 0778.62022
Digital Object Identifier: doi:10.1093/biomet/80.2.267
Song, J. S., Johnson, W. E., Zhu, X., Zhang, X., Li, W., Manrai, A. K., Liu, J. S., Chen, R. and Liu, X. S. (2007). Model-based analysis of 2-color arrays (MA2C). Genome Biology 8 R178.
Zheng, M., Barrera, L. O., Ren, B., Wu, Y. N. (2008). ChIP-chip: Data, model, and analysis. Biometrics 63 787–796.
Mathematical Reviews (MathSciNet): MR2395716
Digital Object Identifier: doi:10.1111/j.1541-0420.2007.00768.x

2012 © Institute of Mathematical Statistics

The Annals of Applied Statistics

The Annals of Applied Statistics