Abstract
It is quite common to encounter compositional data in a regression framework in data analysis. When both responses and predictors are compositional, most existing models rely on a family of log-ratio based transformations to move the analysis from the simplex to the reals. This often makes the interpretation of the model more complex. A transformation-free regression model was recently developed, but it only allows for a single compositional predictor. However, many datasets include multiple compositional predictors of interest. Motivated by an application to hydrothermal liquefaction (HTL) data, a novel extension of this transformation-free regression model is provided that allows for two (or more) compositional predictors to be used via a latent variable mixture. A modified expectation-maximization algorithm is proposed to estimate model parameters, which are shown to have natural interpretations. Conformal inference is used to obtain prediction limits on the compositional response. The resulting methodology is applied to the HTL dataset. Extensions to multiple predictors are discussed.
Funding Statement
Nicholas Rios and Lingzhou Xue were supported in part by the Natural Science Foundation grants (DMS-2210775, CCF-2007823, DMS-1953189) and the National Institute of General Medical Sciences grant (1R01GM152812).
Xiang Zhan was supported in part by the National Natural Science Foundation of China (grant no. 12371287).
Acknowledgments
The authors would like to thank the referees, the Associate Editor, and the Editor for constructive comments that improved this article. Xiang Zhan is the corresponding author.
Citation
Nicholas Rios. Lingzhou Xue. Xiang Zhan. "A latent variable mixture model for composition-on-composition regression with application to chemical recycling." Ann. Appl. Stat. 18 (4) 3253 - 3273, December 2024. https://doi.org/10.1214/24-AOAS1935
Information