December 2021 Pan-disease clustering analysis of the trend of period prevalence
Sneha Jadhav, Chenjin Ma, Yefei Jiang, Ben-Chang Shia, Shuangge Ma
Author Affiliations +
Ann. Appl. Stat. 15(4): 1945-1958 (December 2021). DOI: 10.1214/21-AOAS1470


Prevalence is of essential importance in biomedical and public health research. In the “classic” paradigm it has been studied for each disease individually. Accumulating evidence has shown that diseases can be “correlated.” Joint analysis of prevalence can potentially provide important insights beyond individual-disease analysis but has not been well pursued. In this study we take advantage of the unique Taiwan National Health Insurance Research Database (NHIRD) and conduct the first pan-disease analysis of period prevalence trend. The goal is to identify clusters within which diseases have similar period prevalence trends. A novel penalization pursuit approach is applied which has an intuitive formulation and preferable numerical performance. In data analysis the period prevalence values are computed using the records on close to one million subjects and 14 years of observation. With 405 diseases, 35 clusters with sizes larger than one and 27 clusters with sizes one are identified. The clustering results have sound interpretations and differ significantly from those of the alternatives.

Funding Statement

The study was supported by NSF Grant 1916251 and a Yale Cancer Center Pilot Award.


We thank the Editor, Associate Editor and reviewers for their careful reviews and insightful comments.

Jadhav and Ma are joint first authors and made equal contributions.


Download Citation

Sneha Jadhav. Chenjin Ma. Yefei Jiang. Ben-Chang Shia. Shuangge Ma. "Pan-disease clustering analysis of the trend of period prevalence." Ann. Appl. Stat. 15 (4) 1945 - 1958, December 2021.


Received: 1 May 2020; Revised: 1 January 2021; Published: December 2021
First available in Project Euclid: 21 December 2021

MathSciNet: MR4355083
zbMATH: 1498.62225
Digital Object Identifier: 10.1214/21-AOAS1470

Keywords: disease prevalence , functional clustering , NHIRD , temporal trends

Rights: Copyright © 2021 Institute of Mathematical Statistics


This article is only available to subscribers.
It is not available for individual sale.

Vol.15 • No. 4 • December 2021
Back to Top