The Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 4, Number 4 (2010), 1660-1697.
Subsampling methods for genomic inference
Peter J. Bickel, Nathan Boley, James B. Brown, Haiyan Huang, and Nancy R. Zhang
Abstract
Large-scale statistical analysis of data sets associated with genome sequences plays an important role in modern biology. A key component of such statistical analyses is the computation of p-values and confidence bounds for statistics defined on the genome. Currently such computation is commonly achieved through ad hoc simulation measures. The method of randomization, which is at the heart of these simulation procedures, can significantly affect the resulting statistical conclusions. Most simulation schemes introduce a variety of hidden assumptions regarding the nature of the randomness in the data, resulting in a failure to capture biologically meaningful relationships. To address the need for a method of assessing the significance of observations within large scale genomic studies, where there often exists a complex dependency structure between observations, we propose a unified solution built upon a data subsampling approach. We propose a piecewise stationary model for genome sequences and show that the subsampling approach gives correct answers under this model. We illustrate the method on three simulation studies and two real data examples.
Article information
Source
Ann. Appl. Stat. Volume 4, Number 4 (2010), 1660-1697.
Dates
First available in Project Euclid: 4 January 2011
Permanent link to this document
http://projecteuclid.org/euclid.aoas/1294167794
Digital Object Identifier
doi:10.1214/10-AOAS363
Mathematical Reviews number (MathSciNet)
MR2829932
Zentralblatt MATH identifier
1220.62130
Keywords
Genome Structure Correction (GSC) subsampling piecewise stationary model segmentation-block bootstrap feature overlap
Citation
Bickel, Peter J.; Boley, Nathan; Brown, James B.; Huang, Haiyan; Zhang, Nancy R. Subsampling methods for genomic inference. Ann. Appl. Stat. 4 (2010), no. 4, 1660--1697. doi:10.1214/10-AOAS363. http://projecteuclid.org/euclid.aoas/1294167794.
Supplemental materials
- Supplementary material: Some theorems in subsampling
methods for genomic inference. In Supplementary Material, we provide theoretical
proofs to the theorems presented in the main
text.Digital Object Identifier: doi:10.1214/10-AOAS363SUPP

