Open Access
December 2024 Predicting milk traits from spectral data using Bayesian probabilistic partial least squares regression
Szymon Urbas, Pierre Lovera, Robert Daly, Alan O’Riordan, Donagh Berry, Isobel Claire Gormley
Author Affiliations +
Ann. Appl. Stat. 18(4): 3486-3506 (December 2024). DOI: 10.1214/24-AOAS1947

Abstract

High-dimensional spectral data—routinely generated in dairy production—are used to predict a range of traits in milk products. Partial least squares (PLS) regression is ubiquitously used for these prediction tasks. However, PLS regression is not typically viewed as arising from a probabilistic model, and parameter uncertainty is rarely quantified. Additionally, PLS regression does not easily lend itself to model-based modifications, coherent prediction intervals are not readily available, and the process of choosing the latent-space dimension, Q, can be subjective and sensitive to data size.

We introduce a Bayesian latent-variable model, emulating the desirable properties of PLS regression while accounting for parameter uncertainty in prediction. The need to choose Q is eschewed through a nonparametric shrinkage prior. The flexibility of the proposed Bayesian partial least squares (BPLS) regression framework is exemplified by considering sparsity modifications and allowing for multivariate response prediction.

The BPLS regression framework is used in two motivating settings: (1) multivariate trait prediction from mid-infrared spectral analyses of milk samples and (2) milk pH prediction from surface-enhanced Raman spectral data. The prediction performance of BPLS regression at least matches that of PLS regression. Additionally, the provision of correctly calibrated prediction intervals objectively provides richer, more informative inference for stakeholders in dairy production.

Funding Statement

This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) and the Department of Agriculture, Food and Marine on behalf of the Government of Ireland under grant number (16/RC/3835) and the SFI Insight Research Centre under grant number (SFI/12/RC/2289_P2).

Acknowledgments

The authors would like to thank the members of the VistaMilk Science Foundation Ireland (SFI) Research Center and, in particular, the members of the VistaMilk Spectroscopy Working Group for discussions that contributed to this work.

The authors are grateful to the Editors and the two anonymous reviewers for comments and suggestions that have materially improved the article.

Citation

Download Citation

Szymon Urbas. Pierre Lovera. Robert Daly. Alan O’Riordan. Donagh Berry. Isobel Claire Gormley. "Predicting milk traits from spectral data using Bayesian probabilistic partial least squares regression." Ann. Appl. Stat. 18 (4) 3486 - 3506, December 2024. https://doi.org/10.1214/24-AOAS1947

Information

Received: 1 July 2023; Revised: 1 July 2024; Published: December 2024
First available in Project Euclid: 31 October 2024

Digital Object Identifier: 10.1214/24-AOAS1947

Keywords: Bayesian factor analysis , High-dimensional statistics , Milk trait prediction , Partial least squares , spectral data

Rights: Copyright © 2024 Institute of Mathematical Statistics

Vol.18 • No. 4 • December 2024
Back to Top