Estimating the occurrence rate of DNA palindromes

I-Ping Tu; Shao-Hsuan Wang; Yuan-Fu Huang

doi:10.1214/12-AOAS622

June 2013 Estimating the occurrence rate of DNA palindromes

I-Ping Tu, Shao-Hsuan Wang, Yuan-Fu Huang

Ann. Appl. Stat. 7(2): 1095-1110 (June 2013). DOI: 10.1214/12-AOAS622

Abstract

A DNA palindrome is a segment of letters along a DNA sequence with inversion symmetry that one strand is identical to its complementary one running in the opposite direction. Searching nonrandom clusters of DNA palindromes, an interesting bioinformatic problem, relies on the estimation of the null palindrome occurrence rate. The most commonly used approach for estimating this number is the average rate method. However, we observed that the average rate could exceed the actual rate by 50% when inserting 5000 bp hot-spot regions with 15-fold rate in a simulated 150,000 bp genome sequence. Here, we propose a Markov based estimator to avoid counting the number of palindromes directly, and thus to reduce the impact from the hot-spots. Our simulation shows that this method is more robust against the hot-spot effect than the average rate method. Furthermore, this method can be generalized to either a higher order Markov model or a segmented Markov model, and extended to calculate the occurrence rate for palindromes with gaps. We also provide a $p$-value approximation for various scan statistics to test nonrandom palindrome clusters under a Markov model.

Citation

Download Citation

I-Ping Tu. Shao-Hsuan Wang. Yuan-Fu Huang. "Estimating the occurrence rate of DNA palindromes." Ann. Appl. Stat. 7 (2) 1095 - 1110, June 2013. https://doi.org/10.1214/12-AOAS622

Information

Published: June 2013

First available in Project Euclid: 27 June 2013

zbMATH: 1288.62170

MathSciNet: MR3113502

Digital Object Identifier: 10.1214/12-AOAS622

Keywords: $p$-value , DNA palindrome , genome sequence , hairpin structure , higher order Markov model , hot-spot , Markov model , occurrence rate , Poisson process , power , segmented Markov model

Access the abstract

JOURNAL ARTICLE
16 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY