Open Access
2024 Direct covariance matrix estimation with compositional data
Aaron J. Molstad, Karl Oskar Ekvall, Piotr M. Suder
Author Affiliations +
Electron. J. Statist. 18(1): 1702-1748 (2024). DOI: 10.1214/24-EJS2222


Compositional data arise in many areas of research in the natural and biomedical sciences. One prominent example is in the study of the human gut microbiome, where one can measure the relative abundance of many distinct microorganisms in a subject’s gut. Often, practitioners are interested in learning how the dependencies between microbes vary across distinct populations or experimental conditions. In statistical terms, the goal is to estimate a covariance matrix for the (latent) log-abundances of the microbes in each of the populations. However, the compositional nature of the data prevents the use of standard estimators for these covariance matrices. In this article, we propose an estimator of multiple covariance matrices which allows for information sharing across distinct populations of samples. Compared to some existing estimators, which estimate the covariance matrices of interest indirectly, our estimator is direct, ensures positive definiteness, and is the solution to a convex optimization problem. We compute our estimator using a proximal-proximal gradient descent algorithm. Asymptotic properties of our estimator reveal that it can perform well in high-dimensional settings. We show that our method provides more reliable estimates than competitors in an analysis of microbiome data from subjects with myalgic encephalomyelitis/chronic fatigue syndrome and through simulation studies.

Funding Statement

Aaron J. Molstad was supported in part by NSF DMS-2113589. Piotr M. Suder was supported in part by University Scholars Program at the University of Florida.


The authors thank the associate editor and two referees for their insightful comments and suggestions.


Download Citation

Aaron J. Molstad. Karl Oskar Ekvall. Piotr M. Suder. "Direct covariance matrix estimation with compositional data." Electron. J. Statist. 18 (1) 1702 - 1748, 2024.


Received: 1 March 2023; Published: 2024
First available in Project Euclid: 18 April 2024

Digital Object Identifier: 10.1214/24-EJS2222

Keywords: Compositional data , Convex optimization , covariance matrix estimation , microbiome data analysis , positive definiteness

Vol.18 • No. 1 • 2024
Back to Top