Open Access
Translator Disclaimer
March 2018 Kernel-penalized regression for analysis of microbiome data
Timothy W. Randolph, Sen Zhao, Wade Copeland, Meredith Hullar, Ali Shojaie
Ann. Appl. Stat. 12(1): 540-566 (March 2018). DOI: 10.1214/17-AOAS1102


The analysis of human microbiome data is often based on dimension-reduced graphical displays and clusterings derived from vectors of microbial abundances in each sample. Common to these ordination methods is the use of biologically motivated definitions of similarity. Principal coordinate analysis, in particular, is often performed using ecologically defined distances, allowing analyses to incorporate context-dependent, non-Euclidean structure. In this paper, we go beyond dimension-reduced ordination methods and describe a framework of high-dimensional regression models that extends these distance-based methods. In particular, we use kernel-based methods to show how to incorporate a variety of extrinsic information, such as phylogeny, into penalized regression models that estimate taxon-specific associations with a phenotype or clinical outcome. Further, we show how this regression framework can be used to address the compositional nature of multivariate predictors comprised of relative abundances; that is, vectors whose entries sum to a constant. We illustrate this approach with several simulations using data from two recent studies on gut and vaginal microbiomes. We conclude with an application to our own data, where we also incorporate a significance test for the estimated coefficients that represent associations between microbial abundance and a percent fat.


Download Citation

Timothy W. Randolph. Sen Zhao. Wade Copeland. Meredith Hullar. Ali Shojaie. "Kernel-penalized regression for analysis of microbiome data." Ann. Appl. Stat. 12 (1) 540 - 566, March 2018.


Received: 1 January 2017; Revised: 1 May 2017; Published: March 2018
First available in Project Euclid: 9 March 2018

zbMATH: 06894717
MathSciNet: MR3773404
Digital Object Identifier: 10.1214/17-AOAS1102

Keywords: Compositional data , distance-based analysis , kernel methods , microbial community data , penalized regression

Rights: Copyright © 2018 Institute of Mathematical Statistics


Vol.12 • No. 1 • March 2018
Back to Top