Open Access
September, 1994 A Poisson Approximation for Sequence Comparisons with Insertions and Deletions
Claudia Neuhauser
Ann. Statist. 22(3): 1603-1629 (September, 1994). DOI: 10.1214/aos/1176325645

Abstract

We construct a statistical test for a sequence alignment problem which enables us to decide whether two given sequences are related. Such a test can be used in DNA and protein sequence comparisons. It is based on a comparison of two long sequences of i.i.d. letters taken from a finite alphabet. The test statistic typically employed is the length of the longest matching region between the two sequences in which a certain number of insertions and deletions but no mismatches are allowed. We give a distributional result which enables one to compute $P$-values, and hence to decide whether or not the two sequences are related. Its proof utilizes the Chen-Stein method for Poisson approximation. The test is based on a greedy algorithm that searches for the longest matching region. We show that this algorithm finds the longest matching region with probability approaching 1 as the lengths of the two sequences go to infinity.

Citation

Download Citation

Claudia Neuhauser. "A Poisson Approximation for Sequence Comparisons with Insertions and Deletions." Ann. Statist. 22 (3) 1603 - 1629, September, 1994. https://doi.org/10.1214/aos/1176325645

Information

Published: September, 1994
First available in Project Euclid: 11 April 2007

zbMATH: 0817.62013
MathSciNet: MR1311992
Digital Object Identifier: 10.1214/aos/1176325645

Subjects:
Primary: 62F05
Secondary: 92D20

Keywords: Chen-Stein method , DNA sequences , greedy algorithm , Poisson approximation , Sequence matching

Rights: Copyright © 1994 Institute of Mathematical Statistics

Vol.22 • No. 3 • September, 1994
Back to Top