Annals of Applied Statistics

Function-on-scalar quantile regression with application to mass spectrometry proteomics data

Yusha Liu, Meng Li, and Jeffrey S. Morris

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

Mass spectrometry proteomics, characterized by spiky, spatially heterogeneous functional data, can be used to identify potential cancer biomarkers. Existing mass spectrometry analyses utilize mean regression to detect spectral regions that are differentially expressed across groups. However, given the interpatient heterogeneity that is a key hallmark of cancer, many biomarkers are only present at aberrant levels for a subset of, not all, cancer samples. Differences in these biomarkers can easily be missed by mean regression but might be more easily detected by quantile-based approaches. Thus, we propose a unified Bayesian framework to perform quantile regression on functional responses. Our approach utilizes an asymmetric Laplace working likelihood, represents the functional coefficients with basis representations which enable borrowing of strength from nearby locations and places a global-local shrinkage prior on the basis coefficients to achieve adaptive regularization. Different types of basis transform and continuous shrinkage priors can be used in our framework. A scalable Gibbs sampler is developed to generate posterior samples that can be used to perform Bayesian estimation and inference while accounting for multiple testing. Our framework performs quantile regression and coefficient regularization in a unified manner, allowing them to inform each other and leading to improvement in performance over competing methods, as demonstrated by simulation studies. We also introduce an adjustment procedure to the model to improve its frequentist properties of posterior inference. We apply our model to identify proteomic biomarkers of pancreatic cancer that are differentially expressed for a subset of cancer patients compared to the normal controls which were missed by previous mean-regression based approaches. Supplementary Material for this article is available online.

Article information

Source
Ann. Appl. Stat., Volume 14, Number 2 (2020), 521-541.

Dates
Received: March 2019
Revised: October 2019
First available in Project Euclid: 29 June 2020

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1593449314

Digital Object Identifier
doi:10.1214/19-AOAS1319

Keywords
Bayesian hierarchical model functional data analysis functional response regression global-local shrinkage proteomic biomarker quantile regression

Citation

Liu, Yusha; Li, Meng; Morris, Jeffrey S. Function-on-scalar quantile regression with application to mass spectrometry proteomics data. Ann. Appl. Stat. 14 (2020), no. 2, 521--541. doi:10.1214/19-AOAS1319. https://projecteuclid.org/euclid.aoas/1593449314


Export citation

References

  • Baggerly, K. A., Morris, J. S. and Coombes, K. R. (2004). Reproducibility of SELDI-TOF protein patterns in serum: Comparing datasets from different experiments. Bioinformatics 20 777–785.
  • Baggerly, K. A., Morris, J. S., Wang, J., Gold, D., Xiao, L.-C. and Coombes, K. R. (2003). A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization-time of flight proteomics spectra from serum samples. Proteomics 3 1667–1672.
  • Bhattacharya, A., Pati, D., Pillai, N. S. and Dunson, D. B. (2015). Dirichlet–Laplace priors for optimal shrinkage. J. Amer. Statist. Assoc. 110 1479–1490.
  • Brockhaus, S. and Ruegamer, D. (2017). FDboost: Boosting functional regression models.
  • Brockhaus, S., Scheipl, F., Hothorn, T. and Greven, S. (2015). The functional linear array model. Stat. Model. 15 279–300.
  • Cai, Z. and Xu, X. (2008). Nonparametric quantile estimations for dynamic smooth coefficient models. J. Amer. Statist. Assoc. 103 1595–1608.
  • Cardot, H., Crambes, C. and Sarda, P. (2005). Quantile regression when the covariates are functions. J. Nonparametr. Stat. 17 841–856.
  • Carvalho, C. M., Polson, N. G. and Scott, J. G. (2009). Handling sparsity via the horseshoe. In Artificial Intelligence and Statistics 73–80.
  • Carvalho, C. M., Polson, N. G. and Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika 97 465–480.
  • Chen, K. and Müller, H.-G. (2012). Conditional quantile analysis when covariates are functions, with application to growth data. J. R. Stat. Soc. Ser. B. Stat. Methodol. 74 67–89.
  • Coombes, K. R., Tsavachidis, S., Morris, J. S., Baggerly, K. A., Hung, M.-C. and Kuerer, H. M. (2005). Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform. Proteomics 5 4107–4117.
  • Deutsch, E. W., Lam, H. and Aebersold, R. (2008). Data analysis and bioinformatics tools for tandem mass spectrometry in proteomics. Physiol. Genomics 33 18–25.
  • Fasiolo, M., Goude, Y., Nedellec, R. and Wood, S. N. (2018). Fast calibrated additive quantile regression. Preprint. Available at arXiv:1707.03307.
  • Feng, X. and Zhu, L. (2016). Estimation and testing of varying coefficients in quantile regression. J. Amer. Statist. Assoc. 111 266–274.
  • Ferraty, F., Rabhi, A. and Vieu, P. (2005). Conditional quantiles for dependent functional data with application to the climatic El Niño phenomenon. Sankhyā 67 378–398.
  • Gasteiger, E., Hoogland, C., Gattiker, A., Wilkins, M. R., Appel, R. D., Bairoch, A. et al. (2005). Protein identification and analysis tools on the ExPASy server. In The Proteomics Protocols Handbook 571–607. Springer, Berlin.
  • Geraci, M. and Bottai, M. (2006). Quantile regression for longitudinal data using the asymmetric Laplace distribution. Biostatistics 8 140–154.
  • Guo, J.-C., Li, J., Zhou, L., Yang, J.-Y., Zhang, Z.-G., Liang, Z.-Y., Zhou, W.-X., You, L., Zhang, T.-P. et al. (2016). CXCL12–CXCR7 axis contributes to the invasive phenotype of pancreatic cancer. Oncotarget 7 62006–62018.
  • Innocenti, F., Owzar, K., Cox, N. L., Evans, P., Kubo, M., Zembutsu, H., Jiang, C., Hollis, D., Mushiroda, T. et al. (2012). A genome-wide association study of overall survival in pancreatic cancer patients treated with gemcitabine in CALGB 80303. Clin. Cancer Res. 18 577–584.
  • James, G. M., Wang, J. and Zhu, J. (2009). Functional linear regression that’s interpretable. Ann. Statist. 37 2083–2108.
  • Kato, K. (2012). Estimation in functional linear quantile regression. Ann. Statist. 40 3108–3136.
  • Kim, M.-O. (2007). Quantile regression with varying coefficients. Ann. Statist. 35 92–108.
  • Kinter, M. and Sherman, N. E. (2005). Protein Sequencing and Identification Using Tandem Mass Spectrometry 9. Wiley, New York.
  • Koenker, R. (2005). Quantile Regression. Econometric Society Monographs 38. Cambridge Univ. Press, Cambridge.
  • Koenker, R. (2017). quantreg: Quantile regression. R package version 5.33.
  • Koenker, R. and Bassett, G. Jr. (1978). Regression quantiles. Econometrica 46 33–50.
  • Koomen, J. M., Shih, L. N., Coombes, K. R., Li, D., Xiao, L., Fidler, I. J., Abbruzzese, J. L. and Kobayashi, R. (2005). Plasma protein profiling for diagnosis of pancreatic cancer reveals the presence of host response proteins. Clin. Cancer Res. 11 1110–1118.
  • Li, M., Wang, K., Maity, A. and Staicu, A.-M. (2016). Inference in functional linear quantile regression. Preprint. Available at arXiv:1602.08793.
  • Liao, H., Moschidis, E., Riba-Garcia, I., Zhang, Y., Unwin, R. D., Morris, J. S., Graham, J. and Dowsey, A. W. (2014). A new paradigm for clinical biomarker discovery and screening with mass spectrometry through biomedical image analysis principles. In 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI) 1332–1335. IEEE, Piscataway, NJ.
  • Liu, Y., Li, M. and Morris, J. S. (2020). Supplement to “Function-on-scalar quantile regression with application to mass spectrometry proteomics data.” https://doi.org/10.1214/19-AOAS1319SUPPA, https://doi.org/10.1214/19-AOAS1319SUPPB.
  • Lum, K. and Gelfand, A. E. (2012). Spatial quantile multiple regression using the asymmetric Laplace process. Bayesian Anal. 7 235–258.
  • MATLAB (2016). Version 9.1 (R2016b). The MathWorks Inc., Natick, MA.
  • Meyer, M. J., Coull, B. A., Versace, F., Cinciripini, P. and Morris, J. S. (2015). Bayesian function-on-function regression for multilevel functional data. Biometrics 71 563–574.
  • Morris, J. S. (2012). Statistical methods for proteomic biomarker discovery based on feature extraction or functional modeling approaches. Stat. Interface 5 117–135.
  • Morris, J. S. (2015). Functional regression. Annu. Rev. Stat. Appl. 2 321–359.
  • Morris, J. S. (2017). Comparison and contrast of two general functional regression modelling frameworks [Discussion of MR3619335]. Stat. Model. 17 59–85.
  • Morris, J. S. and Carroll, R. J. (2006). Wavelet-based functional mixed models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 179–199.
  • Morris, J. S., Brown, P. J., Herrick, R. C., Baggerly, K. A. and Coombes, K. R. (2008). Bayesian analysis of mass spectrometry proteomic data using wavelet-based functional mixed models. Biometrics 64 479–489, 667.
  • Polson, N. G. and Scott, J. G. (2011). Shrink globally, act locally: Sparse Bayesian regularization and prediction. In Bayesian Statistics 9 501–538. Oxford Univ. Press, Oxford.
  • Reed, C. and Yu, K. (2009). A partially collapsed Gibbs sampler for Bayesian quantile regression.
  • Ruppert, D., Wand, M. P. and Carroll, R. J. (2003). Semiparametric Regression. Cambridge Series in Statistical and Probabilistic Mathematics 12. Cambridge Univ. Press, Cambridge.
  • Sorace, J. M. and Zhan, M. (2003). A data review and re-assessment of ovarian cancer serum proteomic profiling. BMC Bioinform. 4 Art. ID 24.
  • Sriram, K. (2015). A sandwich likelihood correction for Bayesian quantile regression based on the misspecified asymmetric Laplace density. Statist. Probab. Lett. 107 18–26.
  • Syring, N. and Martin, R. (2019). Calibrating general posterior credible regions. Biometrika 106 479–486.
  • R Core Team (2017). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  • van der Pas, S. L., Kleijn, B. J. K. and van der Vaart, A. W. (2014). The horseshoe estimator: Posterior concentration around nearly black vectors. Electron. J. Stat. 8 2585–2618.
  • Wang, H. J., Zhu, Z. and Zhou, J. (2009). Quantile regression in partially linear varying coefficient models. Ann. Statist. 37 3841–3866.
  • Xi, R., Li, Y. and Hu, Y. (2016). Bayesian quantile regression based on the empirical likelihood with spike and slab priors. Bayesian Anal. 11 821–855.
  • Yang, Y. and He, X. (2012). Bayesian empirical likelihood for quantile regression. Ann. Statist. 40 1102–1131.
  • Yang, Y., Wang, H. J. and He, X. (2016). Posterior inference in Bayesian quantile regression with asymmetric Laplace likelihood. Int. Stat. Rev. 84 327–344.
  • Yee, N. S., Chan, A. S., Yee, J. D. and Yee, R. K. (2012). TRPM7 and TRPM8 ion channels in pancreatic adenocarcinoma: Potential roles as cancer biomarkers and targets. Scientifica 2012 Art. ID 415158.
  • Yu, K. and Moyeed, R. A. (2001). Bayesian quantile regression. Statist. Probab. Lett. 54 437–447.
  • Yue, Y. R. and Rue, H. (2011). Bayesian inference for additive mixed quantile regression models. Comput. Statist. Data Anal. 55 84–96.
  • Zhang, J., Gonzalez, E., Hestilow, T., Haskins, W. and Huang, Y. (2009). Review of peak detection algorithms in liquid-chromatography–mass spectrometry. Curr. Genomics 10 388–401.

Supplemental materials

  • Supplement A to “Function-on-scalar quantile regression with application to mass spectrometry proteomics data”. We provided the pancreatic cancer mass spectrometry dataset and the related code, which are also available at https://github.com/MorrisStatLab/FunctionalQuantileRegression.
  • Supplement B to “Function-on-scalar quantile regression with application to mass spectrometry proteomics data”. We provided details of the MCMC sampling procedure, additional results of data application and implementation details of the “FDboost” package.