The Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 4, Number 2 (2010), 962-987.
A nested mixture model for protein identification using mass spectrometry
Qunhua Li, Michael J. MacCoss, and Matthew Stephens
Abstract
Mass spectrometry provides a high-throughput way to identify proteins in biological samples. In a typical experiment, proteins in a sample are first broken into their constituent peptides. The resulting mixture of peptides is then subjected to mass spectrometry, which generates thousands of spectra, each characteristic of its generating peptide. Here we consider the problem of inferring, from these spectra, which proteins and peptides are present in the sample. We develop a statistical approach to the problem, based on a nested mixture model. In contrast to commonly used two-stage approaches, this model provides a one-stage solution that simultaneously identifies which proteins are present, and which peptides are correctly identified. In this way our model incorporates the evidence feedback between proteins and their constituent peptides. Using simulated data and a yeast data set, we compare and contrast our method with existing widely used approaches (PeptideProphet/ProteinProphet) and with a recently published new approach, HSM. For peptide identification, our single-stage approach yields consistently more accurate results. For protein identification the methods have similar accuracy in most settings, although we exhibit some scenarios in which the existing methods perform poorly.
Article information
Source
Ann. Appl. Stat. Volume 4, Number 2 (2010), 962-987.
Dates
First available in Project Euclid: 3 August 2010
Permanent link to this document
http://projecteuclid.org/euclid.aoas/1280842148
Digital Object Identifier
doi:10.1214/09-AOAS316
Mathematical Reviews number (MathSciNet)
MR2758429
Zentralblatt MATH identifier
1194.62118
Keywords
Mixture model nested structure EM algorithm protein identification peptide identification mass spectrometry proteomics
Citation
Li, Qunhua; MacCoss, Michael J.; Stephens, Matthew. A nested mixture model for protein identification using mass spectrometry. Ann. Appl. Stat. 4 (2010), no. 2, 962--987. doi:10.1214/09-AOAS316. http://projecteuclid.org/euclid.aoas/1280842148.

