The Annals of Applied Statistics

The effects of nonignorable missing data on label-free mass spectrometry proteomics experiments

Jonathon J. O’Brien, Harsha P. Gunawardena, Joao A. Paulo, Xian Chen, Joseph G. Ibrahim, Steven P. Gygi, and Bahjat F. Qaqish

An idealized version of a label-free discovery mass spectrometry proteomics experiment would provide absolute abundance measurements for a whole proteome, across varying conditions. Unfortunately, this ideal is not realized. Measurements are made on peptides requiring an inferential step to obtain protein level estimates. The inference is complicated by experimental factors that necessitate relative abundance estimation and result in widespread nonignorable missing data. Relative abundance on the log scale takes the form of parameter contrasts. In a complete-case analysis, contrast estimates may be biased by missing data, and a substantial amount of useful information will often go unused.

To avoid problems with missing data, many analysts have turned to single imputation solutions. Unfortunately, these methods often create further difficulties by hiding inestimable contrasts, preventing the recovery of interblock information and failing to account for imputation uncertainty. To mitigate many of the problems caused by missing values, we propose the use of a Bayesian selection model. Our model is tested on simulated data, real data with simulated missing values, and on a ground truth dilution experiment where all of the true relative changes are known. The analysis suggests that our model, compared with various imputation strategies and complete-case analyses, can increase accuracy and provide substantial improvements to interval coverage.

Ann. Appl. Stat., Volume 12, Number 4 (2018), 2075-2095.

Received: March 2017
Revised: January 2018
Data dependent analysis estimable contrasts selection model Bayesian inference imputation interval coverage


O’Brien, Jonathon J.; Gunawardena, Harsha P.; Paulo, Joao A.; Chen, Xian; Ibrahim, Joseph G.; Gygi, Steven P.; Qaqish, Bahjat F. The effects of nonignorable missing data on label-free mass spectrometry proteomics experiments. Ann. Appl. Stat. 12 (2018), no. 4, 2075--2095. doi:10.1214/18-AOAS1144.

  • Supplementary material. $\bullet$ Additional text containing proofs, simulation details, algorithmic and experimental procedures. $\bullet$ W2W16 Tables: Data tables and results pertaining to the two sample breast cancer data. $\bullet$ Dilution Tables part 1: Data tables and results pertaining to the ground truth dilution experiment. $\bullet$Dilution Tables part 2: Data tables and results pertaining to the ground truth dilution experiment. $\bullet$ Dilution Tables part 3: Data tables and results pertaining to the ground truth dilution experiment.