Source: Ann. Probab. Volume 30, Number 4
(2002), 1893-1912.
Let $\{X_i;\, i\geq 1\},\,\{Y_i;\,i\geq 1\},\,\{U, U_i;\, i\geq 1\}$ and $\{V, V_i;\, i\geq 1\}$ be four i.i.d. sequences of random variables. Suppose U and V are uniformly distributed on $[0,1]^3.$ For each realization of $\{U_j;\, 1\leq j\leq n\},\ \{X_{i,p};\break \, 1\leq p \leq n\}$ is constructed as a certain permutation of $\{X_p;\, 1\leq p\leq n\}$ for any $1\leq i \leq n.$ Also, $\{Y_{j,p};\, 1\leq p \leq n\}, 1\leq j\leq n,$ are constructed the same way, based on $\{Y_j\}$ and $\{V_j\}.$ For a score function F, we show that \begin{eqnarray*} W_n:= \max_{1\leq i, j,m \leq n}\sum_{p=1}^m F(X_{i,p},Y_{j, p}) \end{eqnarray*} has an asymptotic extreme distribution with the same parameters as in the one-dimensional case. This model is constructed for a comparison of scores of protein structures with foldings.
References
[1] ALTSCHUL, S. F., GISH, W., MILLER, W., My ERS, E. W. and LIPMAN, D. J. (1990). Basic local aligment search tool. J. Molecular Biol. 215 403-410.
[2] ARRATIA, R., GOLDSTEIN, L. and GORDON, L. (1989). Two moments suffice for Poisson approximation: The Chen-Stein method. Ann. Probab. 17 9-25.
[3] ARRATIA, R., GORDON, L. and WATERMAN, M. S. (1986). An extreme value theory for sequence matching. Ann. Statist. 14 971-993.
[4] ARRATIA, R., GORDON, L. and WATERMAN, M. S. (1990). The Erd¨os-R´eny i strong law in distribution for coin tossing and sequence matching. Ann. Statist. 18 539-570.
[5] ARRATIA, R., MORIS, P. and WATERMAN, M. S. (1988). Stochastic scrabble: Large deviations for sequences with scores. J. Appl. Probab. 25 106-119.
[6] ARRATIA, R. and WATERMAN, M. S. (1985). Critical phenomena in sequence matching. Ann. Probab. 13 1236-1249.
[7] CHOW, Y. S. and TEICHER, H. (1988). Probability Theory, Independence, Interchangeability, Martingales, 2nd. ed. Springer, New York.
Mathematical Reviews (MathSciNet):
MR953964
[8] DEMBO, A., KARLIN, S. and ZEITOUNI, O. (1994). Critical phenomena for sequence matching with scoring. Ann. Probab. 22 1993-2021.
[9] DEMBO, A., KARLIN, S. and ZEITOUNI, O. (1994). Limit distribution of maximal nonaligned two-sequences segmental score. Ann. Probab. 22 2022-2039.
[10] DEMBO, A. and ZEITOUNI, O. (1998). Large Deviations Techniques and Applications, 2nd. ed. Springer, New York.
[11] FELLER, W. (1971). An Introduction to Probability Theory and Its Applications 2, 2nd. ed. Wiley, New York.
[12] HENIKOFF, S. and HENIKOFF, J. G. (1992). Amino acid substitution matrices from protein blocks. Proc. Nat. Acad. Sci. U.S.A. 89 10915-10919.
[13] JIANG, T. (2002). Maxima of partial sums indexed by geometrical structures. Ann. Probab. 30 1854-1892.
[14] KARLIN, S. (1994). Statistical studies of bimolecular sequences: Score-based methods. Phi. Trans. Roy. Soc. London Ser. B 344 391-402.
[15] KARLIN, S. and ALTSCHUL, S. F. (1990). Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87 2264-2268.
[16] KARLIN, S. and OST, F. (1988). Maximal length of common words among random letter sequences. Ann. Probab. 16 535-563.
[17] POLLARD, D. (1984). Convergence of Stochastic Processes. Springer, New York.
[18] SPITZER, F. (1964). Principles of Random Walk. Van Nostrand, Princeton.
[19] STATES, D. J., GISH, W. and ALTSCHUL, S. F. (1991). Improved sensitivity of nucleic acid database searches using applications-specific scoring matrices. Methods 3 66-70.
[20] STORMO, G. D. and HARTZELL, G. W. (1989). Identifying protein-binding sites from unaligned DNA fragments. Proc. Natl. Acad. Sci. USA 86 1183-1187.
MINNEAPOLIS, MINNESOTA 55455 E-MAIL: tjiang@stat.umn.edu.