Open Access
December 2022 Semi-supervised nonparametric Bayesian modelling of spatial proteomics
Oliver M. Crook, Kathryn S. Lilley, Laurent Gatto, Paul D. W. Kirk
Author Affiliations +
Ann. Appl. Stat. 16(4): 2554-2576 (December 2022). DOI: 10.1214/22-AOAS1603

Abstract

Understanding subcellular protein localisation is an essential component in the analysis of context specific protein function. Recent advances in quantitative mass-spectrometry (MS) have led to high-resolution mapping of thousands of proteins to subcellular locations within the cell. Novel modelling considerations to capture the complex nature of these data are thus necessary. We approach analysis of spatial proteomics data in a nonparametric Bayesian framework, using K-component mixtures of Gaussian process regression models. The Gaussian process regression model accounts for correlation structure within a subcellular niche, with each mixture component capturing the distinct correlation structure observed within each niche. The availability of marker proteins (i.e., proteins with a priori known labelled locations) motivates a semi-supervised learning approach to inform the Gaussian process hyperparameters. We moreover provide an efficient Hamiltonian-within-Gibbs sampler for our model. Furthermore, we reduce the computational burden associated with inversion of covariance matrices by exploiting the structure in the covariance matrix. A tensor decomposition of our covariance matrices allows extended Trench and Durbin algorithms to be applied to reduce the computational complexity of inversion and hence accelerate computation. We provide detailed case-studies on Drosophila embryos and mouse pluripotent embryonic stem cells to illustrate the benefit of semi-supervised functional Bayesian modelling of the data.

Funding Statement

While completing this work, OMC was a Wellcome Trust Mathematical Genomics and Medicine student supported financially by the School of Clinical Medicine, University of Cambridge. KSL and LG were supported by Wellcome Trust Award 110170/Z/15/Z. PDWK is supported by MRC project reference MC_UU_00002/13, and the National Institute for Health Research (Cambridge Biomedical Research Centre at the Cambridge University Hospitals NHS Foundation Trust).
The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care.

Citation

Download Citation

Oliver M. Crook. Kathryn S. Lilley. Laurent Gatto. Paul D. W. Kirk. "Semi-supervised nonparametric Bayesian modelling of spatial proteomics." Ann. Appl. Stat. 16 (4) 2554 - 2576, December 2022. https://doi.org/10.1214/22-AOAS1603

Information

Received: 1 July 2020; Revised: 1 October 2021; Published: December 2022
First available in Project Euclid: 26 September 2022

MathSciNet: MR4489223
zbMATH: 1498.62204
Digital Object Identifier: 10.1214/22-AOAS1603

Keywords: Bayesian mixture models , proteomics , semi-supervised learning

Rights: Copyright © 2022 Institute of Mathematical Statistics

Vol.16 • No. 4 • December 2022
Back to Top