The Annals of Probability

A comparison of scores of two protein structures with foldings

Tiefeng Jiang
Source: Ann. Probab. Volume 30, Number 4 (2002), 1893-1912.

Abstract

Let $\{X_i;\, i\geq 1\},\,\{Y_i;\,i\geq 1\},\,\{U, U_i;\, i\geq 1\}$ and $\{V, V_i;\, i\geq 1\}$ be four i.i.d. sequences of random variables. Suppose U and V are uniformly distributed on $[0,1]^3.$ For each realization of $\{U_j;\, 1\leq j\leq n\},\ \{X_{i,p};\break \, 1\leq p \leq n\}$ is constructed as a certain permutation of $\{X_p;\, 1\leq p\leq n\}$ for any $1\leq i \leq n.$ Also, $\{Y_{j,p};\, 1\leq p \leq n\}, 1\leq j\leq n,$ are constructed the same way, based on $\{Y_j\}$ and $\{V_j\}.$ For a score function F, we show that \begin{eqnarray*} W_n:= \max_{1\leq i, j,m \leq n}\sum_{p=1}^m F(X_{i,p},Y_{j, p}) \end{eqnarray*} has an asymptotic extreme distribution with the same parameters as in the one-dimensional case. This model is constructed for a comparison of scores of protein structures with foldings.

First Page: Show Hide
Primary Subjects: 60F10, 60B10
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aop/1039548375
Digital Object Identifier: doi:10.1214/aop/1039548375
Mathematical Reviews number (MathSciNet): MR1944009
Zentralblatt MATH identifier: 1020.60015

References

[1] ALTSCHUL, S. F., GISH, W., MILLER, W., My ERS, E. W. and LIPMAN, D. J. (1990). Basic local aligment search tool. J. Molecular Biol. 215 403-410.
[2] ARRATIA, R., GOLDSTEIN, L. and GORDON, L. (1989). Two moments suffice for Poisson approximation: The Chen-Stein method. Ann. Probab. 17 9-25.
Mathematical Reviews (MathSciNet): MR90b:60021
Zentralblatt MATH: 0675.60017
Digital Object Identifier: doi:10.1214/aop/1176991491
Project Euclid: euclid.aop/1176991491
[3] ARRATIA, R., GORDON, L. and WATERMAN, M. S. (1986). An extreme value theory for sequence matching. Ann. Statist. 14 971-993.
Mathematical Reviews (MathSciNet): MR87m:62056
Zentralblatt MATH: 0602.62015
Digital Object Identifier: doi:10.1214/aos/1176350045
Project Euclid: euclid.aos/1176350045
[4] ARRATIA, R., GORDON, L. and WATERMAN, M. S. (1990). The Erd¨os-R´eny i strong law in distribution for coin tossing and sequence matching. Ann. Statist. 18 539-570.
[5] ARRATIA, R., MORIS, P. and WATERMAN, M. S. (1988). Stochastic scrabble: Large deviations for sequences with scores. J. Appl. Probab. 25 106-119.
Mathematical Reviews (MathSciNet): MR89h:60042
Zentralblatt MATH: 0645.60077
Digital Object Identifier: doi:10.2307/3214238
[6] ARRATIA, R. and WATERMAN, M. S. (1985). Critical phenomena in sequence matching. Ann. Probab. 13 1236-1249.
Mathematical Reviews (MathSciNet): MR86m:60076
Zentralblatt MATH: 0576.60058
Digital Object Identifier: doi:10.1214/aop/1176992808
Project Euclid: euclid.aop/1176992808
[7] CHOW, Y. S. and TEICHER, H. (1988). Probability Theory, Independence, Interchangeability, Martingales, 2nd. ed. Springer, New York.
Mathematical Reviews (MathSciNet): MR953964
[8] DEMBO, A., KARLIN, S. and ZEITOUNI, O. (1994). Critical phenomena for sequence matching with scoring. Ann. Probab. 22 1993-2021.
Mathematical Reviews (MathSciNet): MR97b:60043
Zentralblatt MATH: 0834.60031
Digital Object Identifier: doi:10.1214/aop/1176988492
Project Euclid: euclid.aop/1176988492
[9] DEMBO, A., KARLIN, S. and ZEITOUNI, O. (1994). Limit distribution of maximal nonaligned two-sequences segmental score. Ann. Probab. 22 2022-2039.
Mathematical Reviews (MathSciNet): MR97c:60073
Zentralblatt MATH: 0836.60023
Digital Object Identifier: doi:10.1214/aop/1176988493
Project Euclid: euclid.aop/1176988493
[10] DEMBO, A. and ZEITOUNI, O. (1998). Large Deviations Techniques and Applications, 2nd. ed. Springer, New York.
Mathematical Reviews (MathSciNet): MR99d:60030
[11] FELLER, W. (1971). An Introduction to Probability Theory and Its Applications 2, 2nd. ed. Wiley, New York.
[12] HENIKOFF, S. and HENIKOFF, J. G. (1992). Amino acid substitution matrices from protein blocks. Proc. Nat. Acad. Sci. U.S.A. 89 10915-10919.
[13] JIANG, T. (2002). Maxima of partial sums indexed by geometrical structures. Ann. Probab. 30 1854-1892.
Zentralblatt MATH: 1014.60024
Mathematical Reviews (MathSciNet): MR1944008
Digital Object Identifier: doi:10.1214/aop/1039548374
Project Euclid: euclid.aop/1039548374
[14] KARLIN, S. (1994). Statistical studies of bimolecular sequences: Score-based methods. Phi. Trans. Roy. Soc. London Ser. B 344 391-402.
[15] KARLIN, S. and ALTSCHUL, S. F. (1990). Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87 2264-2268.
Zentralblatt MATH: 0695.92004
[16] KARLIN, S. and OST, F. (1988). Maximal length of common words among random letter sequences. Ann. Probab. 16 535-563.
Mathematical Reviews (MathSciNet): MR89h:60047
Zentralblatt MATH: 0645.60034
Digital Object Identifier: doi:10.1214/aop/1176991772
Project Euclid: euclid.aop/1176991772
[17] POLLARD, D. (1984). Convergence of Stochastic Processes. Springer, New York.
Mathematical Reviews (MathSciNet): MR86i:60074
Zentralblatt MATH: 0544.60045
[18] SPITZER, F. (1964). Principles of Random Walk. Van Nostrand, Princeton.
Mathematical Reviews (MathSciNet): MR30:1521
[19] STATES, D. J., GISH, W. and ALTSCHUL, S. F. (1991). Improved sensitivity of nucleic acid database searches using applications-specific scoring matrices. Methods 3 66-70.
[20] STORMO, G. D. and HARTZELL, G. W. (1989). Identifying protein-binding sites from unaligned DNA fragments. Proc. Natl. Acad. Sci. USA 86 1183-1187.
MINNEAPOLIS, MINNESOTA 55455 E-MAIL: tjiang@stat.umn.edu.

2012 © Institute of Mathematical Statistics

The Annals of Probability

The Annals of Probability