Open Access
June 2019 Survival analysis of DNA mutation motifs with penalized proportional hazards
Jean Feng, David A. Shaw, Vladimir N. Minin, Noah Simon, Frederick A. Matsen IV
Ann. Appl. Stat. 13(2): 1268-1294 (June 2019). DOI: 10.1214/18-AOAS1233


Antibodies, an essential part of our immune system, develop through an intricate process to bind a wide array of pathogens. This process involves randomly mutating DNA sequences encoding these antibodies to find variants with improved binding, though mutations are not distributed uniformly across sequence sites. Immunologists observe this nonuniformity to be consistent with “mutation motifs” which are short DNA subsequences that affect how likely a given site is to experience a mutation. Quantifying the effect of motifs on mutation rates is challenging. A large number of possible motifs makes this statistical problem high dimensional, while the unobserved history of the mutation process leads to a nontrivial missing data problem. We introduce an $\ell_{1}$-penalized proportional hazards model to infer mutation motifs and their effects. In order to estimate model parameters, our method uses a Monte Carlo EM algorithm to marginalize over the unknown ordering of mutations. We show that our method performs better on simulated data compared to current methods and leads to more parsimonious models. The application of proportional hazards to mutation processes is, to our knowledge, novel and formalizes the current methods in a statistical framework that can be easily extended to analyze the effect of other biological features on mutation rates.


Download Citation

Jean Feng. David A. Shaw. Vladimir N. Minin. Noah Simon. Frederick A. Matsen IV. "Survival analysis of DNA mutation motifs with penalized proportional hazards." Ann. Appl. Stat. 13 (2) 1268 - 1294, June 2019.


Received: 1 November 2017; Revised: 1 September 2018; Published: June 2019
First available in Project Euclid: 17 June 2019

zbMATH: 1423.62142
MathSciNet: MR3963571
Digital Object Identifier: 10.1214/18-AOAS1233

Keywords: Antibody maturation , Lasso , Monte Carlo expectation–maximization , somatic hypermutation , Survival analysis

Rights: Copyright © 2019 Institute of Mathematical Statistics

Vol.13 • No. 2 • June 2019
Back to Top