The Annals of Applied Statistics

Multiple testing of local maxima for detection of peaks in ChIP-Seq data

Armin Schwartzman, Andrew Jaffe, Yulia Gavrilov, and Clifford A. Meyer

Full-text: Open access

Abstract

A topological multiple testing approach to peak detection is proposed for the problem of detecting transcription factor binding sites in ChIP-Seq data. After kernel smoothing of the tag counts over the genome, the presence of a peak is tested at each observed local maximum, followed by multiple testing correction at the desired false discovery rate level. Valid $p$-values for candidate peaks are computed via Monte Carlo simulations of smoothed Poisson sequences, whose background Poisson rates are obtained via linear regression from a Control sample at two different scales. The proposed method identifies nearby binding sites that other methods do not.

Article information

Source
Ann. Appl. Stat., Volume 7, Number 1 (2013), 471-494.

Dates
First available in Project Euclid: 9 April 2013

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1365527207

Digital Object Identifier
doi:10.1214/12-AOAS594

Mathematical Reviews number (MathSciNet)
MR3086427

Zentralblatt MATH identifier
06171280

Keywords
False discovery rate kernel smoothing matched filter Poisson sequence topological inference

Citation

Schwartzman, Armin; Jaffe, Andrew; Gavrilov, Yulia; Meyer, Clifford A. Multiple testing of local maxima for detection of peaks in ChIP-Seq data. Ann. Appl. Stat. 7 (2013), no. 1, 471--494. doi:10.1214/12-AOAS594. https://projecteuclid.org/euclid.aoas/1365527207


Export citation

References

  • Arkin, B. L. and Leenis, L. M. (2000). Nonparametric estimation of the cumulative intensity function for a nonhomogeneous Poisson process from overlapping realizations. Management Science 46 989–998.
  • Barski, A. and Zhao, K. (2009). Genomic location analysis by ChIP-Seq. J. Cell. Biochem. 107 11–18.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
  • Bolstad, W. M. (1995). The multiprocess dynamic Poisson model. J. Amer. Statist. Assoc. 90 227–232.
  • Fejes, A., Robertson, G., Bilenky, M., Varhol, R., Bainbridge, M. and Jones, S. (2008). FindPeaks 3.1: A tool for identifying areas of enrichment from massively parallel short-read sequencing technology. Bioinformatics 24 1720–1730.
  • Harvey, A. C. and Durbin, J. (1986). The effects of seat belt legislation on British road casualties: A case study in structural time series modelling. J. Roy. Statist. Soc. 149 187–227.
  • Helmers, R., Mangku, I. W. and Zitikis, R. (2003). Consistent estimation of the intensity function of a cyclic Poisson process. J. Multivariate Anal. 84 19–39.
  • Hower, V., Evans, S. N. and Pachter, L. (2011). Shape-based identification for ChIP-Seq. BMC Bioinformatics 12 15.
  • Jaffe, A. E., Feinberg, A. P., Irizarry, R. A. and Leek, J. T. (2012). Significance analysis and statistical dissection of variably methylated regions. Biostatistics 13 166–178.
  • Ji, H., Jiang, H., Ma, W., Johnson, D. S., Myers, R. M. and Wong, W. H. (2008). An integrated software system for analyzing ChIP-chip and ChIP-Seq data. Nature Biotechnology 26 1293–1300.
  • Johnson, D. S., Mortazavi, A., Myers, R. M. and Wold, B. (2007). Genome-wide mapping of in vivo protein-DNA interactions. Science 316 1497–1502.
  • Mikkelsen, T. S., Ku, M., Jaffe, D. B., Issac, B., Lieberman, E., Giannoukos, G., Alvarez, P., Brockman, W., Kim, T. K., Koche, R. P., Lee, W., Mendenhall, E., O’Donovan, A., Presser, A., Russ, C., Xie, X., Meissner, A., Wernig, M., Jaenisch, R., Nusbaum, C., Lander, E. S. and Bernstein, B. E. (2007). Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448 553–560.
  • North, D. O. (1943). An analysis of the factors which determine signal/noise discrimination in pulsed carrier systems. Technical Report No. PTR-6C, RCA Labs, Princeton, NJ.
  • Park, P. J. (2009). ChIP-seq: Advantages and challenges of a maturing technology. Nat. Rev. Genet. 10 669–680.
  • Pratt, W. K. (1991). Digital Image Processing. Wiley, New York.
  • Sakharkar, M. K., Chow, V. T. K. and Kangueane, P. (2004). Distributions of exons and introns in the human genome. In Silico Biology 4 0032.
  • Schwartzman, A., Gavrilov, Y. and Adler, R. J. (2011). Multiple testing of local maxima for detection of peaks in 1D. Ann. Statist. 39 3290–3319.
  • Simon, M. (1995). Digital Communication Techniques: Signal Design and Detection. Prentice Hall, Englewood Cliffs, NJ.
  • Spyrou, C., Stark., R., Lynch, A. G. and Tavar, S. (2009). Bayesian analysis of ChIP-Seq data. BMC Bioinformatics 10 299.
  • Turin, G. L. (1960). An introduction to matched filters. Trans. IRE IT-6 311–329.
  • Valouev, A., Johnson, D. S., Sundquist, A., Medina, C., Anton, E., Batzglou, S., Myers, R. M. and Sidow, A. (2008). Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nature Methods 5 829–834.
  • Wasserman, L. (2006). All of Nonparametric Statistics. Springer, New York.
  • West, M., Harrison, P. J. and Migon, H. S. (1985). Dynamic generalized linear models and Bayesian forecasting. J. Amer. Statist. Assoc. 80 73–97.
  • Zhang, Y., Liu, T., Meyer, C. A., Eeckhoute, J., Johnson, D. S., Bernstein, B. E., Nussbaum, C., Myers, R. M., Brown, M., Li, W. and Liu, X. S. (2008). Model-based analysis of ChIP-Seq (MACS). Genome Biology 9 R137.
  • Zhang, X., Robertson, G., Krzywinski, M., Ning, K., Droit, A., Jones, S. and Gottardo, R. (2011). PICS: Probabilistic inference for ChIP-seq. Biometrics 67 151–163.
  • Zhao, M. and Xie, M. (1996). On maximum likelihood estimation for a general non-homogeneous Poisson process. Scand. J. Stat. 23 597–607.