Abstract
High-dimensional spectral data—routinely generated in dairy production—are used to predict a range of traits in milk products. Partial least squares (PLS) regression is ubiquitously used for these prediction tasks. However, PLS regression is not typically viewed as arising from a probabilistic model, and parameter uncertainty is rarely quantified. Additionally, PLS regression does not easily lend itself to model-based modifications, coherent prediction intervals are not readily available, and the process of choosing the latent-space dimension, , can be subjective and sensitive to data size.
We introduce a Bayesian latent-variable model, emulating the desirable properties of PLS regression while accounting for parameter uncertainty in prediction. The need to choose is eschewed through a nonparametric shrinkage prior. The flexibility of the proposed Bayesian partial least squares (BPLS) regression framework is exemplified by considering sparsity modifications and allowing for multivariate response prediction.
The BPLS regression framework is used in two motivating settings: (1) multivariate trait prediction from mid-infrared spectral analyses of milk samples and (2) milk pH prediction from surface-enhanced Raman spectral data. The prediction performance of BPLS regression at least matches that of PLS regression. Additionally, the provision of correctly calibrated prediction intervals objectively provides richer, more informative inference for stakeholders in dairy production.
Funding Statement
This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) and the Department of Agriculture, Food and Marine on behalf of the Government of Ireland under grant number (16/RC/3835) and the SFI Insight Research Centre under grant number (SFI/12/RC/2289_P2).
Acknowledgments
The authors would like to thank the members of the VistaMilk Science Foundation Ireland (SFI) Research Center and, in particular, the members of the VistaMilk Spectroscopy Working Group for discussions that contributed to this work.
The authors are grateful to the Editors and the two anonymous reviewers for comments and suggestions that have materially improved the article.
Citation
Szymon Urbas. Pierre Lovera. Robert Daly. Alan O’Riordan. Donagh Berry. Isobel Claire Gormley. "Predicting milk traits from spectral data using Bayesian probabilistic partial least squares regression." Ann. Appl. Stat. 18 (4) 3486 - 3506, December 2024. https://doi.org/10.1214/24-AOAS1947
Information