Abstract
We show in general how the substitution matrix and gap penalty function for local sequence alignments can be chosen such that the score statistic grows at a logarithmic rate when the two sequences are unrelated. The method used is the construction of a mixture distribution in which sequences with large scores are generated with uniformly higher likelihood. This distribution is also used for the importance sampling of the $p$-value of the score. An upper bound of this $p$-value is computed and compared against the simulated value.
Citation
Hock Peng Chan. "Upper bounds and importance sampling of p-values for DNA and protein sequence alignments." Bernoulli 9 (2) 183 - 199, April 2003. https://doi.org/10.3150/bj/1068128974
Information