Statistical Science

Population Structure and Cryptic Relatedness in Genetic Association Studies

William Astle and David J. Balding

We review the problem of confounding in genetic association studies, which arises principally because of population structure and cryptic relatedness. Many treatments of the problem consider only a simple “island” model of population structure. We take a broader approach, which views population structure and cryptic relatedness as different aspects of a single confounder: the unobserved pedigree defining the (often distant) relationships among the study subjects. Kinship is therefore a central concept, and we review methods of defining and estimating kinship coefficients, both pedigree-based and marker-based. In this unified framework we review solutions to the problem of population structure, including family-based study designs, genomic control, structured association, regression control, principal components adjustment and linear mixed models. The last solution makes the most explicit use of the kinships among the study subjects, and has an established role in the analysis of animal and plant breeding studies. Recent computational developments mean that analyses of human genetic association data are beginning to benefit from its powerful tests for association, which protect against population structure and cryptic kinship, as well as intermediate levels of confounding by the pedigree.

Statist. Sci. Volume 24, Number 4 (2009), 451-471.

First available: 20 April 2010

Astle, William; Balding, David J. Population Structure and Cryptic Relatedness in Genetic Association Studies. Statistical Science 24 (2009), no. 4, 451--471. doi:10.1214/09-STS307.

