Open Access
December 2018 The effects of nonignorable missing data on label-free mass spectrometry proteomics experiments
Jonathon J. O’Brien, Harsha P. Gunawardena, Joao A. Paulo, Xian Chen, Joseph G. Ibrahim, Steven P. Gygi, Bahjat F. Qaqish
Ann. Appl. Stat. 12(4): 2075-2095 (December 2018). DOI: 10.1214/18-AOAS1144

Abstract

An idealized version of a label-free discovery mass spectrometry proteomics experiment would provide absolute abundance measurements for a whole proteome, across varying conditions. Unfortunately, this ideal is not realized. Measurements are made on peptides requiring an inferential step to obtain protein level estimates. The inference is complicated by experimental factors that necessitate relative abundance estimation and result in widespread nonignorable missing data. Relative abundance on the log scale takes the form of parameter contrasts. In a complete-case analysis, contrast estimates may be biased by missing data, and a substantial amount of useful information will often go unused.

To avoid problems with missing data, many analysts have turned to single imputation solutions. Unfortunately, these methods often create further difficulties by hiding inestimable contrasts, preventing the recovery of interblock information and failing to account for imputation uncertainty. To mitigate many of the problems caused by missing values, we propose the use of a Bayesian selection model. Our model is tested on simulated data, real data with simulated missing values, and on a ground truth dilution experiment where all of the true relative changes are known. The analysis suggests that our model, compared with various imputation strategies and complete-case analyses, can increase accuracy and provide substantial improvements to interval coverage.

Citation

Download Citation

Jonathon J. O’Brien. Harsha P. Gunawardena. Joao A. Paulo. Xian Chen. Joseph G. Ibrahim. Steven P. Gygi. Bahjat F. Qaqish. "The effects of nonignorable missing data on label-free mass spectrometry proteomics experiments." Ann. Appl. Stat. 12 (4) 2075 - 2095, December 2018. https://doi.org/10.1214/18-AOAS1144

Information

Received: 1 March 2017; Revised: 1 January 2018; Published: December 2018
First available in Project Euclid: 13 November 2018

zbMATH: 07029447
MathSciNet: MR3875693
Digital Object Identifier: 10.1214/18-AOAS1144

Keywords: Bayesian inference , Data dependent analysis , estimable contrasts , imputation , interval coverage , selection model

Rights: Copyright © 2018 Institute of Mathematical Statistics

Vol.12 • No. 4 • December 2018
Back to Top