Open Access
June 2020 Estimation and inference in metabolomics with nonrandom missing data and latent factors
Chris McKennan, Carole Ober, Dan Nicolae
Ann. Appl. Stat. 14(2): 789-808 (June 2020). DOI: 10.1214/20-AOAS1328

Abstract

High-throughput metabolomics data are fraught with both nonignorable missing observations and unobserved factors that influence a metabolite’s measured concentration, and it is well known that ignoring either of these complications can compromise estimators. However, current methods to analyze these data can only account for the missing data or unobserved factors, but not both. We therefore developed MetabMiss, a statistically rigorous method to account for both nonrandom missing data and latent factors in high-throughput metabolomics data. Our methodology does not require the practitioner specify a likelihood for the missing data, and makes investigating the relationship between the metabolome and tens, or even hundreds, of phenotypes computationally tractable. We demonstrate the fidelity of MetabMiss’s estimates using both simulated and real metabolomics data and prove their asymptotic correctness when the sample size and number of metabolites grows to infinity.

Citation

Download Citation

Chris McKennan. Carole Ober. Dan Nicolae. "Estimation and inference in metabolomics with nonrandom missing data and latent factors." Ann. Appl. Stat. 14 (2) 789 - 808, June 2020. https://doi.org/10.1214/20-AOAS1328

Information

Received: 1 September 2019; Revised: 1 February 2020; Published: June 2020
First available in Project Euclid: 29 June 2020

zbMATH: 07239884
MathSciNet: MR4117830
Digital Object Identifier: 10.1214/20-AOAS1328

Keywords: batch variables , generalized method of moments , latent factors , Metabolomics , missing not at random (MNAR)

Rights: Copyright © 2020 Institute of Mathematical Statistics

Vol.14 • No. 2 • June 2020
Back to Top