June 2021 Prediction of the NASH through penalized mixture of logistic regression models
Marie Morvan, Emilie Devijver, Madison Giacofci, Valérie Monbet
Author Affiliations +
Ann. Appl. Stat. 15(2): 952-970 (June 2021). DOI: 10.1214/20-AOAS1409


In this paper an appropriate and interpretable diagnosis statistical model is proposed to predict Nonalcoholic Steatohepatitis (NASH) from near infrared spectrometry data. In this disease, unknown patients’ profiles are expected to lead to a different diagnosis. The model has then to take into account the heterogeneity of the data and the dimension of the spectrometric data.

To this end, we propose to fit a mixture model on the joint distribution of the diagnostic binary variable and the covariates selected in the spectra. The penalized maximum likelihood estimator is considered. In practice, a twofold penalty on both regression coefficients and covariance parameters is imposed. Automatic selection criteria, such as the AIC and BIC, are used to select the amount of shrinkage and the number of clusters. The performance of the overall procedure is evaluated by a simulation study, and its application on the NASH data set is analyzed. The model leads to better prediction performance than competitive methods and provides highly interpretable results.

Funding Statement

Part of this work was supported by the CNRS and AMIES institutions through the exploratory projects PEPS I3A AppSpec and PEPS I3A STATOPO.


The authors are grateful to Rodolphe Anty (Centre Hospitalier Universitaire de Nice) and the hepatology unit for the provision of the data set, to Diafir and, more precisely, Hugues Tariel and Maëna Le Corvec for the spectrometric measures. The authors also thank Olivier Loréal (INSERM, Univ. Rennes) for his precious medical expertise and helpful suggestions.


Download Citation

Marie Morvan. Emilie Devijver. Madison Giacofci. Valérie Monbet. "Prediction of the NASH through penalized mixture of logistic regression models." Ann. Appl. Stat. 15 (2) 952 - 970, June 2021. https://doi.org/10.1214/20-AOAS1409


Received: 1 November 2019; Revised: 1 October 2020; Published: June 2021
First available in Project Euclid: 12 July 2021

MathSciNet: MR4298952
zbMATH: 1477.62318
Digital Object Identifier: 10.1214/20-AOAS1409

Keywords: heterogeneous data , Mixture regression model , prediction , spectrometry data , Variable selection

Rights: Copyright © 2021 Institute of Mathematical Statistics


This article is only available to subscribers.
It is not available for individual sale.

Vol.15 • No. 2 • June 2021
Back to Top