A comparison of principal component methods between multiple phenotype regression and multiple SNP regression in genetic association studies

Zhonghua Liu; Ian Barnett; Xihong Lin

doi:10.1214/19-AOAS1312

March 2020 A comparison of principal component methods between multiple phenotype regression and multiple SNP regression in genetic association studies

Zhonghua Liu, Ian Barnett, Xihong Lin

Ann. Appl. Stat. 14(1): 433-451 (March 2020). DOI: 10.1214/19-AOAS1312

Abstract

Principal component analysis (PCA) is a popular method for dimension reduction in unsupervised multivariate analysis. However, existing ad hoc uses of PCA in both multivariate regression (multiple outcomes) and multiple regression (multiple predictors) lack theoretical justification. The differences in the statistical properties of PCAs in these two regression settings are not well understood. In this paper we provide theoretical results on the power of PCA in genetic association testings in both multiple phenotype and SNP-set settings. The multiple phenotype setting refers to the case when one is interested in studying the association between a single SNP and multiple phenotypes as outcomes. The SNP-set setting refers to the case when one is interested in studying the association between multiple SNPs in a SNP set and a single phenotype as the outcome. We demonstrate analytically that the properties of the PC-based analysis in these two regression settings are substantially different. We show that the lower order PCs, that is, PCs with large eigenvalues, are generally preferred and lead to a higher power in the SNP-set setting, while the higher-order PCs, that is, PCs with small eigenvalues, are generally preferred in the multiple phenotype setting. We also investigate the power of three other popular statistical methods, the Wald test, the variance component test and the minimum $p$-value test, in both multiple phenotype and SNP-set settings. We use theoretical power, simulation studies, and two real data analyses to validate our findings.

Citation

Download Citation

Zhonghua Liu. Ian Barnett. Xihong Lin. "A comparison of principal component methods between multiple phenotype regression and multiple SNP regression in genetic association studies." Ann. Appl. Stat. 14 (1) 433 - 451, March 2020. https://doi.org/10.1214/19-AOAS1312

Information

Received: 1 June 2018; Revised: 1 November 2019; Published: March 2020

First available in Project Euclid: 16 April 2020

MathSciNet: MR4085100

zbMATH: 07200178

Digital Object Identifier: 10.1214/19-AOAS1312

Keywords: Dimension reduction , eigen-values , Hypothesis testing , minimum $p$-value test , multiple phenotypes , Principal Component Analysis , SNP-set , variance-component test

ACCESS THE FULL ARTICLE

JOURNAL ARTICLE
19 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY