Abstract
Understanding subcellular protein localisation is an essential component in the analysis of context specific protein function. Recent advances in quantitative mass-spectrometry (MS) have led to high-resolution mapping of thousands of proteins to subcellular locations within the cell. Novel modelling considerations to capture the complex nature of these data are thus necessary. We approach analysis of spatial proteomics data in a nonparametric Bayesian framework, using K-component mixtures of Gaussian process regression models. The Gaussian process regression model accounts for correlation structure within a subcellular niche, with each mixture component capturing the distinct correlation structure observed within each niche. The availability of marker proteins (i.e., proteins with a priori known labelled locations) motivates a semi-supervised learning approach to inform the Gaussian process hyperparameters. We moreover provide an efficient Hamiltonian-within-Gibbs sampler for our model. Furthermore, we reduce the computational burden associated with inversion of covariance matrices by exploiting the structure in the covariance matrix. A tensor decomposition of our covariance matrices allows extended Trench and Durbin algorithms to be applied to reduce the computational complexity of inversion and hence accelerate computation. We provide detailed case-studies on Drosophila embryos and mouse pluripotent embryonic stem cells to illustrate the benefit of semi-supervised functional Bayesian modelling of the data.
Funding Statement
While completing this work, OMC was a Wellcome Trust Mathematical Genomics and Medicine student supported financially by the School of Clinical Medicine, University of Cambridge. KSL and LG were supported by Wellcome Trust Award 110170/Z/15/Z. PDWK is supported by MRC project reference MC_UU_00002/13, and the National Institute for Health Research (Cambridge Biomedical Research Centre at the Cambridge University Hospitals NHS Foundation Trust).
The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care.
Citation
Oliver M. Crook. Kathryn S. Lilley. Laurent Gatto. Paul D. W. Kirk. "Semi-supervised nonparametric Bayesian modelling of spatial proteomics." Ann. Appl. Stat. 16 (4) 2554 - 2576, December 2022. https://doi.org/10.1214/22-AOAS1603
Information