Open Access
June 2011 Detecting simultaneous variant intervals in aligned sequences
David Siegmund, Benjamin Yakir, Nancy R. Zhang
Ann. Appl. Stat. 5(2A): 645-668 (June 2011). DOI: 10.1214/10-AOAS400


Given a set of aligned sequences of independent noisy observations, we are concerned with detecting intervals where the mean values of the observations change simultaneously in a subset of the sequences. The intervals of changed means are typically short relative to the length of the sequences, the subset where the change occurs, the “carriers,” can be relatively small, and the sizes of the changes can vary from one sequence to another. This problem is motivated by the scientific problem of detecting inherited copy number variants in aligned DNA samples. We suggest a statistic based on the assumption that for any given interval of changed means there is a given fraction of samples that carry the change. We derive an analytic approximation for the false positive error probability of a scan, which is shown by simulations to be reasonably accurate. We show that the new method usually improves on methods that analyze a single sample at a time and on our earlier multi-sample method, which is most efficient when the carriers form a large fraction of the set of sequences. The proposed procedure is also shown to be robust with respect to the assumed fraction of carriers of the changes.


Download Citation

David Siegmund. Benjamin Yakir. Nancy R. Zhang. "Detecting simultaneous variant intervals in aligned sequences." Ann. Appl. Stat. 5 (2A) 645 - 668, June 2011.


Published: June 2011
First available in Project Euclid: 13 July 2011

zbMATH: 1223.62166
MathSciNet: MR2840169
Digital Object Identifier: 10.1214/10-AOAS400

Keywords: change-point detection , DNA copy number , scan statistics , segmentation

Rights: Copyright © 2011 Institute of Mathematical Statistics

Vol.5 • No. 2A • June 2011
Back to Top