June 2024 On blockwise and reference panel-based estimators for genetic data prediction in high dimensions
Bingxin Zhao, Shurong Zheng, Hongtu Zhu
Author Affiliations +
Ann. Statist. 52(3): 948-965 (June 2024). DOI: 10.1214/24-AOS2378

Abstract

Genetic prediction holds immense promise for translating genetic discoveries into medical advances. As the high-dimensional covariance matrix (or the linkage disequilibrium (LD) pattern) of genetic variants often presents a block-diagonal structure, numerous methods account for the dependence among variants in predetermined local LD blocks. Moreover, due to privacy considerations and data protection concerns, genetic variant dependence in each LD block is typically estimated from external reference panels rather than the original training data set. This paper presents a unified analysis of blockwise and reference panel-based estimators in a high-dimensional prediction framework without sparsity restrictions. We find that, surprisingly, even when the covariance matrix has a block-diagonal structure with well-defined boundaries, blockwise estimation methods adjusting for local dependence can be substantially less accurate than methods controlling for the whole covariance matrix. Further, estimation methods built on the original training data set and external reference panels are likely to have varying performance in high dimensions, which may reflect the cost of having only access to summary level data from the training data set. This analysis is based on novel results in random matrix theory for block-diagonal covariance matrix. We numerically evaluate our results using extensive simulations and real data analysis in the UK Biobank.

Funding Statement

The first author was supported by the National Institute On Aging of the National Institutes of Health under Award RF1AG082938. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
The first author was also supported in part by funding from the Purdue University Statistics Department and the Department of Statistics and Data Science at the University of Pennsylvania.
The second author was supported in part by NSFC Grants 12326606, 12231011 and 12071066.
The third author was supported in part by NIH Grants RF1AG082938, U01AG079847 and R01AR082684.

Acknowledgments

We would like to thank Ziliang Zhu, Fei Zou and Yue Yang for helpful discussions during the early stages of this paper. We would also like to thank the anonymous referees, an Associate Editor and the Editor for their constructive comments that substantially improved the quality of this paper.

This research has been conducted using the UK Biobank resources (application number 76139), subject to a data transfer agreement. We thank the individuals represented in the UK Biobank for their participation and the research teams for their work in collecting, processing and disseminating these data sets for analysis.

The majority of the work was conducted when the first author was at the University of North Carolina at Chapel Hill and Purdue University. We would like to thank their research computing groups for providing computational resources and support that have contributed to these research results.

Citation

Download Citation

Bingxin Zhao. Shurong Zheng. Hongtu Zhu. "On blockwise and reference panel-based estimators for genetic data prediction in high dimensions." Ann. Statist. 52 (3) 948 - 965, June 2024. https://doi.org/10.1214/24-AOS2378

Information

Received: 1 April 2022; Revised: 1 February 2024; Published: June 2024
First available in Project Euclid: 11 August 2024

Digital Object Identifier: 10.1214/24-AOS2378

Subjects:
Primary: 62J05
Secondary: 60B20

Keywords: block-diagonal covariance matrix , High-dimensional prediction , linkage disequilibrium , Random matrix theory , reference panel

Rights: Copyright © 2024 Institute of Mathematical Statistics

Vol.52 • No. 3 • June 2024
Back to Top