Mass spectrometry-based proteomics has become the tool of choice for identifying and quantifying the proteome of an organism. Though recent years have seen a tremendous improvement in instrument performance and the computational tools used, significant challenges remain, and there are many opportunities for statisticians to make important contributions. In the most widely used “bottom-up” approach to proteomics, complex mixtures of proteins are first subjected to enzymatic cleavage, the resulting peptide products are separated based on chemical or physical properties and analyzed using a mass spectrometer. The two fundamental challenges in the analysis of bottom-up MS-based proteomics are as follows: (1) Identifying the proteins that are present in a sample, and (2) Quantifying the abundance levels of the identified proteins. Both of these challenges require knowledge of the biological and technological context that gives rise to observed data, as well as the application of sound statistical principles for estimation and inference. We present an overview of bottom-up proteomics and outline the key statistical issues that arise in protein identification and quantification.
"Liquid chromatography mass spectrometry-based proteomics: Biological and technological aspects." Ann. Appl. Stat. 4 (4) 1797 - 1823, December 2010. https://doi.org/10.1214/10-AOAS341