Communications in Information & Systems

An Integer Programming Approach for the Selection of Tag SNPs Using Multi-allelic LD

Yang-Ho Chen and Ting Chen

Full-text: Access denied (no subscription detected) We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

Single Nucleotide Polymorphisms (SNPs) are common among human populations. SNPs that are proximally located within a small human chromosome region are generally strongly correlated that a subset of SNPs, termed tag SNPs, can provide enough information to infer neigh- boring SNPs. Such correlations are generally known as linkage disequilibrium (LD) and are measured either pair-wise, such as $r^2$, or multi-to-one (multi-marker). For any given set of SNPs, a variety of algorithms have been proposed to identify a subset of tag SNPs by which the remaining SNPs can be inferred. This paper focuses on finding that number of tag SNPs from which remaining SNPs can be inferred through multi-allelic LD or pair-wise LD with a pre-defined $r^2$ threshold. We call this the optimal tag SNP selection problem. Although this problem is theoretically NP-hard, it can be formulated as an integer programming (IP) problem under a certain constraint, and the opti- mal solution can be efficiently found by our newly developed IPMarker program. In addition, the flexibility of the computational framework allows us to formulate and solve the problem of finding common tag SNPs for multiple populations that have different LD patterns. Various datasets, in- cluding ENCODE and the Major Histocompatiability Complex (MHC) region, were used to evaluate the performance of IPMarker. We also extended IPMarker to the whole genome HapMap Phase I data. Results showed that IPMarker significantly reduces the number of tag SNPs required when compared to the most widely used program, Haploview, although a significant longer running time is required. Thus, overall, genotyping a selected set of tag SNPs is the most cost-effective way to conduct large-scale genome-wide association studies.

Article information

Source
Commun. Inf. Syst. Volume 9, Number 3 (2009), 253-268.

Dates
First available in Project Euclid: 22 January 2010

Permanent link to this document
http://projecteuclid.org/euclid.cis/1264171156

Zentralblatt MATH identifier
05693246

Citation

Chen, Yang-Ho; Chen, Ting. An Integer Programming Approach for the Selection of Tag SNPs Using Multi-allelic LD. Communications in Information & Systems 9 (2009), no. 3, 253--268. http://projecteuclid.org/euclid.cis/1264171156.


Export citation