Open Access
March 2010 A spectral graph approach to discovering genetic ancestry
Ann B. Lee, Diana Luca, Kathryn Roeder
Ann. Appl. Stat. 4(1): 179-202 (March 2010). DOI: 10.1214/09-AOAS281


Mapping human genetic variation is fundamentally interesting in fields such as anthropology and forensic inference. At the same time, patterns of genetic diversity confound efforts to determine the genetic basis of complex disease. Due to technological advances, it is now possible to measure hundreds of thousands of genetic variants per individual across the genome. Principal component analysis (PCA) is routinely used to summarize the genetic similarity between subjects. The eigenvectors are interpreted as dimensions of ancestry. We build on this idea using a spectral graph approach. In the process we draw on connections between multidimensional scaling and spectral kernel methods. Our approach, based on a spectral embedding derived from the normalized Laplacian of a graph, can produce more meaningful delineation of ancestry than by using PCA. The method is stable to outliers and can more easily incorporate different similarity measures of genetic data than PCA. We illustrate a new algorithm for genetic clustering and association analysis on a large, genetically heterogeneous sample.


Download Citation

Ann B. Lee. Diana Luca. Kathryn Roeder. "A spectral graph approach to discovering genetic ancestry." Ann. Appl. Stat. 4 (1) 179 - 202, March 2010.


Published: March 2010
First available in Project Euclid: 11 May 2010

zbMATH: 1189.62170
MathSciNet: MR2758169
Digital Object Identifier: 10.1214/09-AOAS281

Keywords: Dimension reduction , Human genetics , multidimensional scaling , population structure , spectral embedding

Rights: Copyright © 2010 Institute of Mathematical Statistics

Vol.4 • No. 1 • March 2010
Back to Top