## The Annals of Applied Statistics

### A novel spectral method for inferring general diploid selection from time series genetic data

#### Abstract

The increased availability of time series genetic variation data from experimental evolution studies and ancient DNA samples has created new opportunities to identify genomic regions under selective pressure and to estimate their associated fitness parameters. However, it is a challenging problem to compute the likelihood of nonneutral models for the population allele frequency dynamics, given the observed temporal DNA data. Here, we develop a novel spectral algorithm to analytically and efficiently integrate over all possible frequency trajectories between consecutive time points. This advance circumvents the limitations of existing methods which require fine-tuning the discretization of the population allele frequency space when numerically approximating requisite integrals. Furthermore, our method is flexible enough to handle general diploid models of selection where the heterozygote and homozygote fitness parameters can take any values, while previous methods focused on only a few restricted models of selection. We demonstrate the utility of our method on simulated data and also apply it to analyze ancient DNA data from genetic loci associated with coat coloration in horses. In contrast to previous studies, our exploration of the full fitness parameter space reveals that a heterozygote advantage form of balancing selection may have been acting on these loci.

#### Article information

Source
Ann. Appl. Stat., Volume 8, Number 4 (2014), 2203-2222.

Dates
First available in Project Euclid: 19 December 2014

https://projecteuclid.org/euclid.aoas/1419001740

Digital Object Identifier
doi:10.1214/14-AOAS764

Mathematical Reviews number (MathSciNet)
MR3292494

Zentralblatt MATH identifier
06408775

#### Citation

Steinrücken, Matthias; Bhaskar, Anand; Song, Yun S. A novel spectral method for inferring general diploid selection from time series genetic data. Ann. Appl. Stat. 8 (2014), no. 4, 2203--2222. doi:10.1214/14-AOAS764. https://projecteuclid.org/euclid.aoas/1419001740

#### References

• Bollback, J. P., York, T. L. and Nielsen, R. (2008). Estimation of $2N_{e}s$ from temporal allele frequency data. Genetics 179 497–502.
• Burke, M. K., Dunham, J. P., Shahrestani, P., Thornton, K. R., Rose, M. R. and Long, A. D. (2010). Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature 467 587–590.
• Ewens, W. J. (2004). Mathematical Population Genetics: I. Theoretical Introduction, 2nd ed. Springer, New York.
• Fearnhead, P. (2003). Ancestral processes for non-neutral models of complex diseases. Theor. Popul. Biol. 63 115–130.
• Fearnhead, P. (2006). The stationary distribution of allele frequencies when selection acts at unlinked loci. Theor. Popul. Biol. 70 376–386.
• Feder, A. F., Kryazhimskiy, S. and Plotkin, J. B. (2014). Identifying signatures of selection in genetic time series. Genetics 196 509–522.
• Genz, A. and Joyce, P. (2003). Computation of the normalizing constant for exponentially weighted Dirichlet distribution integrals. Computing Science and Statistics 35 181–212.
• Green, R. E., Krause, J., Briggs, A. W., Maricic, T., Stenzel, U., Kircher, M., Patterson, N., Li, H., Zhai, W., Fritz, M. H.-Y. et al. (2010). A draft sequence of the Neandertal genome. Science 328 710–722.
• Gutenkunst, R. N., Hernandez, R. D., Williamson, S. H. and Bustamante, C. D. (2009). Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5 e1000695.
• Hummel, S., Schmidt, D., Kremeyer, B., Herrmann, B. and Oppermann, M. (2005). Detection of the CCR5-Delta32 HIV resistance gene in Bronze Age skeletons. Genes Immun. 6 371–374.
• Lang, G. I., Rice, D. P., Hickman, M. J., Sodergren, E., Weinstock, G. M., Botstein, D. and Desai, M. M. (2013). Pervasive genetic hitchhiking and clonal interference in forty evolving yeast populations. Nature 500 571–574.
• Ludwig, A., Pruvost, M., Reissmann, M., Benecke, N., Brockmann, G. A., Castaños, P., Cieslak, M., Lippold, S., Llorente, L., Malaspinas, A.-S., Slatkin, M. and Hofreiter, M. (2009). Coat color variation at the beginning of horse domestication. Science 324 485.
• Lukić, S., Hey, J. and Chen, K. (2011). Non-equilibrium allele frequency spectra via spectral methods. Theor. Popul. Biol. 79 203–219.
• Malaspinas, A. S., Malaspinas, O., Evans, S. N. and Slatkin, M. (2012). Estimating allele age and selection coefficient from time-serial data. Genetics 192 599–607.
• Mathar, R. J. (2009). A Java Math.BigDecimal implementation of core mathematical functions. Available at arXiv:0908.3030.
• Mathieson, I. and McVean, G. (2013). Estimating selection coefficients in spatially structured populations from time series data of allele frequencies. Genetics 193 973–984.
• Orlando, L., Ginolhac, A., Zhang, G., Froese, D., Albrechtsen, A., Stiller, M., Schubert, M., Cappellini, E., Petersen, B., Moltke, I. et al. (2013). Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499 74–78.
• Orozco-terWengel, P., Kapun, M., Nolte, V., Kofler, R., Flatt, T. and Schlötterer, C. (2012). Adaptation of Drosophila to a novel laboratory environment reveals temporally heterogeneous trajectories of selected alleles. Mol. Ecol. 21 4931–4941.
• Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery, B. P. (2007). Numerical Recipes: The Art of Scientific Computing, 3rd ed. Cambridge Univ. Press, Cambridge.
• Reich, D., Green, R. E., Kircher, M., Krause, J., Patterson, N., Durand, E. Y., Viola, B., Briggs, A. W., Stenzel, U., Johnson, P. L. F. et al. (2010). Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468 1053–1060.
• Shankarappa, R., Margolick, J. B., Gange, S. J., Rodrigo, A. G., Upchurch, D., Farzadegan, H., Gupta, P., Rinaldo, C. R., Learn, G. H., He, X., Huang, X. L. and Mullins, J. I. (1999). Consistent viral evolutionary changes associated with the progression of human immunodeficiency virus type 1 infection. J. Virol. 73 10489–10502.
• Song, Y. S. and Steinrücken, M. (2012). A simple method for finding explicit analytic transition densities of diffusion processes with general diploid selection. Genetics 190 1117–1129.
• Steinrücken, M., Bhaskar, A. and Song, Y. (2014). Supplement to “A novel spectral method for inferring general diploid selection from time series genetic data.” DOI:10.1214/14-AOAS764SUPP.
• Steinrücken, M., Wang, Y. X. R. and Song, Y. S. (2013). An explicit transition density expansion for a multi-allelic Wright–Fisher diffusion with general diploid selection. Theor. Popul. Biol. 83 1–14.
• Stephens, M. and Donnelly, P. (2003). Ancestral inference in population genetics models with selection (with discussion). Aust. N. Z. J. Stat. 45 395–430.
• Williamson, E. G. and Slatkin, M. (1999). Using maximum likelihood to estimate population size from temporal changes in allele frequencies. Genetics 152 755–761.
• Wiser, M. J., Ribeck, N. and Lenski, R. E. (2013). Long-term dynamics of adaptation in asexual populations. Science 342 1364–1367.

#### Supplemental materials

• Supplementary material: A novel spectral method for inferring general diploid selection from time series genetic data. We provide proofs of the results stated in Section 2. The modified Jacobi polynomials appearing in this paper are defined and some of their key properties are listed. Also, the coefficients in the definition of the matrix $\mathbf{M}$ in equation (2.14) are provided. Last, we describe some alternate density functions for the allele frequency at the time when selection arises.