Open Access
June 2016 Regression analysis for microbiome compositional data
Pixu Shi, Anru Zhang, Hongzhe Li
Ann. Appl. Stat. 10(2): 1019-1040 (June 2016). DOI: 10.1214/16-AOAS928

Abstract

One important problem in microbiome analysis is to identify the bacterial taxa that are associated with a response, where the microbiome data are summarized as the composition of the bacterial taxa at different taxonomic levels. This paper considers regression analysis with such compositional data as covariates. In order to satisfy the subcompositional coherence of the results, linear models with a set of linear constraints on the regression coefficients are introduced. Such models allow regression analysis for subcompositions and include the log-contrast model for compositional covariates as a special case. A penalized estimation procedure for estimating the regression coefficients and for selecting variables under the linear constraints is developed. A method is also proposed to obtain debiased estimates of the regression coefficients that are asymptotically unbiased and have a joint asymptotic multivariate normal distribution. This provides valid confidence intervals of the regression coefficients and can be used to obtain the $p$-values. Simulation results show the validity of the confidence intervals and smaller variances of the debiased estimates when the linear constraints are imposed. The proposed methods are applied to a gut microbiome data set and identify four bacterial genera that are associated with the body mass index after adjusting for the total fat and caloric intakes.

Citation

Download Citation

Pixu Shi. Anru Zhang. Hongzhe Li. "Regression analysis for microbiome compositional data." Ann. Appl. Stat. 10 (2) 1019 - 1040, June 2016. https://doi.org/10.1214/16-AOAS928

Information

Received: 1 June 2015; Revised: 1 January 2016; Published: June 2016
First available in Project Euclid: 22 July 2016

zbMATH: 06625679
MathSciNet: MR3528370
Digital Object Identifier: 10.1214/16-AOAS928

Keywords: Compositional coherence , coordinate descent method of multipliers , high dimension , log-contrast model , Model selection , regularization

Rights: Copyright © 2016 Institute of Mathematical Statistics

Vol.10 • No. 2 • June 2016
Back to Top