Abstract
Prevalence is of essential importance in biomedical and public health research. In the “classic” paradigm it has been studied for each disease individually. Accumulating evidence has shown that diseases can be “correlated.” Joint analysis of prevalence can potentially provide important insights beyond individual-disease analysis but has not been well pursued. In this study we take advantage of the unique Taiwan National Health Insurance Research Database (NHIRD) and conduct the first pan-disease analysis of period prevalence trend. The goal is to identify clusters within which diseases have similar period prevalence trends. A novel penalization pursuit approach is applied which has an intuitive formulation and preferable numerical performance. In data analysis the period prevalence values are computed using the records on close to one million subjects and 14 years of observation. With 405 diseases, 35 clusters with sizes larger than one and 27 clusters with sizes one are identified. The clustering results have sound interpretations and differ significantly from those of the alternatives.
Funding Statement
The study was supported by NSF Grant 1916251 and a Yale Cancer Center Pilot Award.
Acknowledgments
We thank the Editor, Associate Editor and reviewers for their careful reviews and insightful comments.
Jadhav and Ma are joint first authors and made equal contributions.
Citation
Sneha Jadhav. Chenjin Ma. Yefei Jiang. Ben-Chang Shia. Shuangge Ma. "Pan-disease clustering analysis of the trend of period prevalence." Ann. Appl. Stat. 15 (4) 1945 - 1958, December 2021. https://doi.org/10.1214/21-AOAS1470
Information