Structured subcomposition selection in regression and its application to microbiome data analysis

Tao Wang; Hongyu Zhao

doi:10.1214/16-AOAS1017

June 2017 Structured subcomposition selection in regression and its application to microbiome data analysis

Tao Wang, Hongyu Zhao

Ann. Appl. Stat. 11(2): 771-791 (June 2017). DOI: 10.1214/16-AOAS1017

Abstract

Compositional data arise naturally in many practical problems and the analysis of such data presents many statistical challenges, especially in high dimensions. In this article, we consider the problem of subcomposition selection in regression with compositional covariates, where the relationships among the covariates can be represented by a tree with leaf nodes corresponding to covariates. Assuming that the tree structure is available as prior knowledge, we adopt a symmetric version of the linear log contrast model, and propose a tree-guided regularization method for this structured subcomposition selection. Our method is based on a novel penalty function that incorporates the tree structure information node-by-node, encouraging the selection of subcompositions at subtree levels. We show that this optimization problem can be formulated as a generalized lasso problem, the solution of which can be computed efficiently using existing algorithms. An application to a human gut microbiome study and simulations are presented to compare the performance of the proposed method with an $l_{1}$ regularization method where the tree structure information is not utilized.

Citation

Download Citation

Tao Wang. Hongyu Zhao. "Structured subcomposition selection in regression and its application to microbiome data analysis." Ann. Appl. Stat. 11 (2) 771 - 791, June 2017. https://doi.org/10.1214/16-AOAS1017

Information

Received: 1 November 2015; Revised: 1 December 2016; Published: June 2017

First available in Project Euclid: 20 July 2017

zbMATH: 06775892

MathSciNet: MR3693546

Digital Object Identifier: 10.1214/16-AOAS1017

Keywords: Compositional data analysis , Feature selection , homogeneity , log ratio transformations , penalized regression , phylogenetic tree , the lasso