## The Annals of Statistics

### Markov jump processes in modeling coalescent with recombination

#### Abstract

Genetic recombination is one of the most important mechanisms that can generate and maintain diversity, and recombination information plays an important role in population genetic studies. However, the phenomenon of recombination is extremely complex, and hence simulation methods are indispensable in the statistical inference of recombination. So far there are mainly two classes of simulation models practically in wide use: back-in-time models and spatially moving models. However, the statistical properties shared by the two classes of simulation models have not yet been theoretically studied. Based on our joint research with CAS-MPG Partner Institute for Computational Biology and with Beijing Jiaotong University, in this paper we provide for the first time a rigorous argument that the statistical properties of the two classes of simulation models are identical. That is, they share the same probability distribution on the space of ancestral recombination graphs (ARGs). As a consequence, our study provides a unified interpretation for the algorithms of simulating coalescent with recombination, and will facilitate the study of statistical inference on recombination.

#### Article information

Source
Ann. Statist., Volume 42, Number 4 (2014), 1361-1393.

Dates
First available in Project Euclid: 25 June 2014

https://projecteuclid.org/euclid.aos/1403715204

Digital Object Identifier
doi:10.1214/14-AOS1227

Mathematical Reviews number (MathSciNet)
MR3226160

Zentralblatt MATH identifier
1319.60163

#### Citation

Chen, Xian; Ma, Zhi-Ming; Wang, Ying. Markov jump processes in modeling coalescent with recombination. Ann. Statist. 42 (2014), no. 4, 1361--1393. doi:10.1214/14-AOS1227. https://projecteuclid.org/euclid.aos/1403715204

#### References

• [1] Chen, G. K., Marjoram, P. and Wall, J. D. (2009). Fast and flexible simulation of DNA sequence data. Genome. Res. 19 136–142.
• [2] Chen, M.-F. (2004). From Markov Chains to Non-Equilibrium Particle Systems, 2nd ed. World Scientific, River Edge, NJ.
• [3] Chen, X. and Ma, Z. M. (2014). A transformation of Markov jump processes and applications in genetic study. Discrete Contin. Dyn. Syst. Ser. A. 34 To appear.
• [4] Chen, X., Ma, Z. and Wang, Y. (2014). Supplement to “Markov jump processes in modeling coalescent with recombination.” DOI:10.1214/14-AOS1227SUPP.
• [5] Cohn, D. L. (1980). Measure Theory. Birkhäuser, Boston, MA.
• [6] Ethier, S. N. and Kurtz, T. G. (1986). Markov Processes: Characterization and Convergence. Wiley, New York.
• [7] Griffiths, R. C. and Marjoram, P. (1996). Ancestral inference from samples of DNA sequences with recombination. J. Comput. Biol. 3 479–502.
• [8] Griffiths, R. C. and Marjoram, P. (1997). An ancestral recombination graph. In Progress in Population Genetics and Human Evolution (P. Donnelly and S. Tavaré, eds.). The IMA Volumes in Mathematics and Its Applications 87 100–117. Springer, Berlin.
• [9] He, S. W., Wang, J. G. and Yan, J. A. (1992). Semimartingale Theory and Stochastic Calculus. Science Press, Beijing.
• [10] Hudson, R. R. (1983). Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol. 23 183–201.
• [11] Hudson, R. R. (2002). Generating samples under a Wright–Fisher neutral model of genetic variation. Bioinformatics 18 337–338.
• [12] Kingman, J. F. C. (1982). The coalescent. Stochastic Process. Appl. 13 235–248.
• [13] Kingman, J. F. C. (1982). On the genealogy of large populations. J. Appl. Probab. 19A 27–43.
• [14] Marjoram, P. and Wall, J. D. (2006). Fast “coalescent” simulation. BMC Genet. 7 16.
• [15] McVean, G. A. T. and Cardin, N. J. (2005). Approximating the coalescent with recombination. Philos. Trans. R. Soc. Lond., B, Biol. Sci. 360 1387–1393.
• [16] Wang, Y., Zhou, Y., Li, L. F., Chen, X., Liu, Y. T., Ma, Z. M. and Xu, S. H. (2013). A new method for modeling coalescent processes with recombination. Preprint.
• [17] Watterson, G. A. (1975). On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7 256–276.
• [18] Wiuf, C. and Hein, J. (1999). Recombination as a point process along sequences. Theor. Popul. Biol. 55 248–259.

#### Supplemental materials

• Supplementary material: Supplement to “Markov jump processes in modeling coalescent with recombination”. The supplementary file is divided into two Appendixes. Appendix A contains the proofs of Propositions 1–9 and Propositions 11–13. Appendix B is devoted to the calculation of the conditional distribution $P(T_{j+1}^{i+1}\in B,\xi_{j+1}^{i+1}=\vec{\xi}|X^{S_{i}},S_{i+1},T_{0}^{i+1},\xi^{i+1},\ldots,T_{j}^{i+1},\xi_{j}^{i+1})$. In particular, the proofs of Theorems 5, 6 and 7 are presented, respectively, in the proofs of Theorems B.10, B.11 and B.12 in Appendix B.