The Annals of Probability

A comparison of scores of two protein structures with foldings

Tiefeng Jiang

Full-text: Open access

Abstract

Let $\{X_i;\, i\geq 1\},\,\{Y_i;\,i\geq 1\},\,\{U, U_i;\, i\geq 1\}$ and $\{V, V_i;\, i\geq 1\}$ be four i.i.d. sequences of random variables. Suppose U and V are uniformly distributed on $[0,1]^3.$ For each realization of $\{U_j;\, 1\leq j\leq n\},\ \{X_{i,p};\break \, 1\leq p \leq n\}$ is constructed as a certain permutation of $\{X_p;\, 1\leq p\leq n\}$ for any $1\leq i \leq n.$ Also, $\{Y_{j,p};\, 1\leq p \leq n\}, 1\leq j\leq n,$ are constructed the same way, based on $\{Y_j\}$ and $\{V_j\}.$ For a score function F, we show that \begin{eqnarray*} W_n:= \max_{1\leq i, j,m \leq n}\sum_{p=1}^m F(X_{i,p},Y_{j, p}) \end{eqnarray*} has an asymptotic extreme distribution with the same parameters as in the one-dimensional case. This model is constructed for a comparison of scores of protein structures with foldings.

Article information

Source
Ann. Probab., Volume 30, Number 4 (2002), 1893-1912.

Dates
First available in Project Euclid: 10 December 2002

Permanent link to this document
https://projecteuclid.org/euclid.aop/1039548375

Digital Object Identifier
doi:10.1214/aop/1039548375

Mathematical Reviews number (MathSciNet)
MR1944009

Zentralblatt MATH identifier
1020.60015

Subjects
Primary: 60F10: Large deviations 60B10: Convergence of probability measures

Keywords
Maxima Chen-Stein method and large deviations

Citation

Jiang, Tiefeng. A comparison of scores of two protein structures with foldings. Ann. Probab. 30 (2002), no. 4, 1893--1912. doi:10.1214/aop/1039548375. https://projecteuclid.org/euclid.aop/1039548375


Export citation

References

  • [1] ALTSCHUL, S. F., GISH, W., MILLER, W., My ERS, E. W. and LIPMAN, D. J. (1990). Basic local aligment search tool. J. Molecular Biol. 215 403-410.
  • [2] ARRATIA, R., GOLDSTEIN, L. and GORDON, L. (1989). Two moments suffice for Poisson approximation: The Chen-Stein method. Ann. Probab. 17 9-25.
  • [3] ARRATIA, R., GORDON, L. and WATERMAN, M. S. (1986). An extreme value theory for sequence matching. Ann. Statist. 14 971-993.
  • [4] ARRATIA, R., GORDON, L. and WATERMAN, M. S. (1990). The Erd¨os-R´eny i strong law in distribution for coin tossing and sequence matching. Ann. Statist. 18 539-570.
  • [5] ARRATIA, R., MORIS, P. and WATERMAN, M. S. (1988). Stochastic scrabble: Large deviations for sequences with scores. J. Appl. Probab. 25 106-119.
  • [6] ARRATIA, R. and WATERMAN, M. S. (1985). Critical phenomena in sequence matching. Ann. Probab. 13 1236-1249.
  • [7] CHOW, Y. S. and TEICHER, H. (1988). Probability Theory, Independence, Interchangeability, Martingales, 2nd. ed. Springer, New York.
  • [8] DEMBO, A., KARLIN, S. and ZEITOUNI, O. (1994). Critical phenomena for sequence matching with scoring. Ann. Probab. 22 1993-2021.
  • [9] DEMBO, A., KARLIN, S. and ZEITOUNI, O. (1994). Limit distribution of maximal nonaligned two-sequences segmental score. Ann. Probab. 22 2022-2039.
  • [10] DEMBO, A. and ZEITOUNI, O. (1998). Large Deviations Techniques and Applications, 2nd. ed. Springer, New York.
  • [11] FELLER, W. (1971). An Introduction to Probability Theory and Its Applications 2, 2nd. ed. Wiley, New York.
  • [12] HENIKOFF, S. and HENIKOFF, J. G. (1992). Amino acid substitution matrices from protein blocks. Proc. Nat. Acad. Sci. U.S.A. 89 10915-10919.
  • [13] JIANG, T. (2002). Maxima of partial sums indexed by geometrical structures. Ann. Probab. 30 1854-1892.
  • [14] KARLIN, S. (1994). Statistical studies of bimolecular sequences: Score-based methods. Phi. Trans. Roy. Soc. London Ser. B 344 391-402.
  • [15] KARLIN, S. and ALTSCHUL, S. F. (1990). Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87 2264-2268.
  • [16] KARLIN, S. and OST, F. (1988). Maximal length of common words among random letter sequences. Ann. Probab. 16 535-563.
  • [17] POLLARD, D. (1984). Convergence of Stochastic Processes. Springer, New York.
  • [18] SPITZER, F. (1964). Principles of Random Walk. Van Nostrand, Princeton.
  • [19] STATES, D. J., GISH, W. and ALTSCHUL, S. F. (1991). Improved sensitivity of nucleic acid database searches using applications-specific scoring matrices. Methods 3 66-70.
  • [20] STORMO, G. D. and HARTZELL, G. W. (1989). Identifying protein-binding sites from unaligned DNA fragments. Proc. Natl. Acad. Sci. USA 86 1183-1187.
  • MINNEAPOLIS, MINNESOTA 55455 E-MAIL: tjiang@stat.umn.edu.