The Annals of Statistics

Markov jump processes in modeling coalescent with recombination

Xian Chen, Zhi-Ming Ma, and Ying Wang

Full-text: Open access

Abstract

Genetic recombination is one of the most important mechanisms that can generate and maintain diversity, and recombination information plays an important role in population genetic studies. However, the phenomenon of recombination is extremely complex, and hence simulation methods are indispensable in the statistical inference of recombination. So far there are mainly two classes of simulation models practically in wide use: back-in-time models and spatially moving models. However, the statistical properties shared by the two classes of simulation models have not yet been theoretically studied. Based on our joint research with CAS-MPG Partner Institute for Computational Biology and with Beijing Jiaotong University, in this paper we provide for the first time a rigorous argument that the statistical properties of the two classes of simulation models are identical. That is, they share the same probability distribution on the space of ancestral recombination graphs (ARGs). As a consequence, our study provides a unified interpretation for the algorithms of simulating coalescent with recombination, and will facilitate the study of statistical inference on recombination.

Article information

Source
Ann. Statist., Volume 42, Number 4 (2014), 1361-1393.

Dates
First available in Project Euclid: 25 June 2014

Permanent link to this document
https://projecteuclid.org/euclid.aos/1403715204

Digital Object Identifier
doi:10.1214/14-AOS1227

Mathematical Reviews number (MathSciNet)
MR3226160

Zentralblatt MATH identifier
1319.60163

Subjects
Primary: 60J25: Continuous-time Markov processes on general state spaces 65C60: Computational problems in statistics
Secondary: 92B15: General biostatistics [See also 62P10] 92D25: Population dynamics (general) 60J75: Jump processes

Keywords
Markov jump process coalescent process random sequence conditional distribution genetic recombination ancestral recombination graph back-in-time algorithm spatial algorithm

Citation

Chen, Xian; Ma, Zhi-Ming; Wang, Ying. Markov jump processes in modeling coalescent with recombination. Ann. Statist. 42 (2014), no. 4, 1361--1393. doi:10.1214/14-AOS1227. https://projecteuclid.org/euclid.aos/1403715204


Export citation

References

  • [1] Chen, G. K., Marjoram, P. and Wall, J. D. (2009). Fast and flexible simulation of DNA sequence data. Genome. Res. 19 136–142.
  • [2] Chen, M.-F. (2004). From Markov Chains to Non-Equilibrium Particle Systems, 2nd ed. World Scientific, River Edge, NJ.
  • [3] Chen, X. and Ma, Z. M. (2014). A transformation of Markov jump processes and applications in genetic study. Discrete Contin. Dyn. Syst. Ser. A. 34 To appear.
  • [4] Chen, X., Ma, Z. and Wang, Y. (2014). Supplement to “Markov jump processes in modeling coalescent with recombination.” DOI:10.1214/14-AOS1227SUPP.
  • [5] Cohn, D. L. (1980). Measure Theory. Birkhäuser, Boston, MA.
  • [6] Ethier, S. N. and Kurtz, T. G. (1986). Markov Processes: Characterization and Convergence. Wiley, New York.
  • [7] Griffiths, R. C. and Marjoram, P. (1996). Ancestral inference from samples of DNA sequences with recombination. J. Comput. Biol. 3 479–502.
  • [8] Griffiths, R. C. and Marjoram, P. (1997). An ancestral recombination graph. In Progress in Population Genetics and Human Evolution (P. Donnelly and S. Tavaré, eds.). The IMA Volumes in Mathematics and Its Applications 87 100–117. Springer, Berlin.
  • [9] He, S. W., Wang, J. G. and Yan, J. A. (1992). Semimartingale Theory and Stochastic Calculus. Science Press, Beijing.
  • [10] Hudson, R. R. (1983). Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol. 23 183–201.
  • [11] Hudson, R. R. (2002). Generating samples under a Wright–Fisher neutral model of genetic variation. Bioinformatics 18 337–338.
  • [12] Kingman, J. F. C. (1982). The coalescent. Stochastic Process. Appl. 13 235–248.
  • [13] Kingman, J. F. C. (1982). On the genealogy of large populations. J. Appl. Probab. 19A 27–43.
  • [14] Marjoram, P. and Wall, J. D. (2006). Fast “coalescent” simulation. BMC Genet. 7 16.
  • [15] McVean, G. A. T. and Cardin, N. J. (2005). Approximating the coalescent with recombination. Philos. Trans. R. Soc. Lond., B, Biol. Sci. 360 1387–1393.
  • [16] Wang, Y., Zhou, Y., Li, L. F., Chen, X., Liu, Y. T., Ma, Z. M. and Xu, S. H. (2013). A new method for modeling coalescent processes with recombination. Preprint.
  • [17] Watterson, G. A. (1975). On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7 256–276.
  • [18] Wiuf, C. and Hein, J. (1999). Recombination as a point process along sequences. Theor. Popul. Biol. 55 248–259.

Supplemental materials

  • Supplementary material: Supplement to “Markov jump processes in modeling coalescent with recombination”. The supplementary file is divided into two Appendixes. Appendix A contains the proofs of Propositions 1–9 and Propositions 11–13. Appendix B is devoted to the calculation of the conditional distribution $P(T_{j+1}^{i+1}\in B,\xi_{j+1}^{i+1}=\vec{\xi}|X^{S_{i}},S_{i+1},T_{0}^{i+1},\xi^{i+1},\ldots,T_{j}^{i+1},\xi_{j}^{i+1})$. In particular, the proofs of Theorems 5, 6 and 7 are presented, respectively, in the proofs of Theorems B.10, B.11 and B.12 in Appendix B.