Pan-disease clustering analysis of the trend of period prevalence

Sneha Jadhav; Chenjin Ma; Yefei Jiang; Ben-Chang Shia; Shuangge Ma

doi:10.1214/21-AOAS1470

Abstract

Prevalence is of essential importance in biomedical and public health research. In the “classic” paradigm it has been studied for each disease individually. Accumulating evidence has shown that diseases can be “correlated.” Joint analysis of prevalence can potentially provide important insights beyond individual-disease analysis but has not been well pursued. In this study we take advantage of the unique Taiwan National Health Insurance Research Database (NHIRD) and conduct the first pan-disease analysis of period prevalence trend. The goal is to identify clusters within which diseases have similar period prevalence trends. A novel penalization pursuit approach is applied which has an intuitive formulation and preferable numerical performance. In data analysis the period prevalence values are computed using the records on close to one million subjects and 14 years of observation. With 405 diseases, 35 clusters with sizes larger than one and 27 clusters with sizes one are identified. The clustering results have sound interpretations and differ significantly from those of the alternatives.

Funding Statement

The study was supported by NSF Grant 1916251 and a Yale Cancer Center Pilot Award.

Acknowledgments

We thank the Editor, Associate Editor and reviewers for their careful reviews and insightful comments.

Jadhav and Ma are joint first authors and made equal contributions.

Citation

Download Citation

Sneha Jadhav. Chenjin Ma. Yefei Jiang. Ben-Chang Shia. Shuangge Ma. "Pan-disease clustering analysis of the trend of period prevalence." Ann. Appl. Stat. 15 (4) 1945 - 1958, December 2021. https://doi.org/10.1214/21-AOAS1470

Information

Received: 1 May 2020; Revised: 1 January 2021; Published: December 2021

First available in Project Euclid: 21 December 2021

MathSciNet: MR4355083

zbMATH: 1498.62225

Digital Object Identifier: 10.1214/21-AOAS1470

Keywords: disease prevalence , functional clustering , NHIRD , temporal trends

Abstract

Funding Statement

Acknowledgments

Citation

Information

KEYWORDS/PHRASES

PUBLICATION TITLE:

PUBLICATION YEARS