A mixed-effects model for incomplete data from labeling-based quantitative proteomics experiments

Lin S. Chen; Jiebiao Wang; Xianlong Wang; Pei Wang

doi:10.1214/16-AOAS994

March 2017 A mixed-effects model for incomplete data from labeling-based quantitative proteomics experiments

Lin S. Chen, Jiebiao Wang, Xianlong Wang, Pei Wang

Ann. Appl. Stat. 11(1): 114-138 (March 2017). DOI: 10.1214/16-AOAS994

Abstract

In mass spectrometry (MS) based quantitative proteomics research, the emerging iTRAQ (isobaric tag for relative and absolute quantitation) and TMT (tandem mass tags) techniques have been widely adopted for high throughput protein profiling. In a typical iTRAQ/TMT proteomics study, samples are grouped into batches, and each batch is processed by one multiplex experiment, in which the abundances of thousands of proteins/peptides in a batch of samples can be measured simultaneously. The multiplex labeling technique greatly enhances the throughput of protein quantification. However, the technical variation across different iTRAQ/TMT multiplex experiments is often large due to the dynamic nature of MS instruments. This leads to strong batch effects in the iTRAQ/TMT data. Moreover, the iTRAQ/TMT data often contain substantial batch-level nonignorable missing entries. Specifically, the abundance measures of a given protein/peptide are often either observed or missing altogether in all the samples from the same batch, with the missing probability depending on the combined batch-level abundances. We term this unique missing-data mechanism as the Batch-level Abundance-Dependent Missing-data Mechanism (BADMM). We introduce a new method—mixEMM—for analyzing iTRAQ/TMT data with batch effects and batch-level nonignorable missingness. The mixEMM method employs a linear mixed-effects model and explicitly models the batch effects and the BADMM. With simulation studies, we showed that, compared with existing approaches that utilize relative abundances and ignore the missing batches under the missing-completely-at-random assumption, the mixEMM method achieves more accurate parameter estimation and inference. We applied the method to an iTRAQ proteomics data from a breast cancer study and identified phosphopeptides differentially expressed between different breast cancer subtypes. The method can be applied to general clustered data with cluster-level nonignorable missing-data mechanisms.

Citation

Download Citation

Lin S. Chen. Jiebiao Wang. Xianlong Wang. Pei Wang. "A mixed-effects model for incomplete data from labeling-based quantitative proteomics experiments." Ann. Appl. Stat. 11 (1) 114 - 138, March 2017. https://doi.org/10.1214/16-AOAS994

Information

Received: 1 June 2015; Revised: 1 September 2016; Published: March 2017

First available in Project Euclid: 8 April 2017

zbMATH: 1366.62207

MathSciNet: MR3634317

Digital Object Identifier: 10.1214/16-AOAS994

Keywords: Batch-level Abundance-Dependent Missing-data Mechanism (BADMM) , Mixed-effects models , the expectation-conditional-maximization (ECM) algorithm