Statistical Science

Discovering Disease Genes: Multipoint Linkage Analysis via a New Markov Chain Monte Carlo Approach

A. W. George and E. A. Thompson

Source: Statist. Sci. Volume 18, Number 4 (2003), 515-531.

Abstract

Multipoint linkage analyses of data collected on related individuals are often performed as a first step in the discovery of disease genes. Through the dependence in inheritance of genes segregating at several linked loci, multipoint linkage analysis detects and localizes chromosomal regions (called trait loci) which contain disease genes. Our ability to correctly detect and position these trait loci is increased with the analysis of data observed on large pedigrees and multiple genetic markers. However, large pedigrees generally contain substantial missing data and exact calculation of the required multipoint likelihoods quickly becomes intractable. In this paper, we present a new Markov chain Monte Carlo approach to multipoint linkage analysis which greatly extends the range of models and data sets for which analysis is practical. Several advances in Markov chain Monte Carlo theory, namely joint updates of latent variables across loci or meioses, integrated proposals, Metropolis--Hastings restarts via sequential imputation and Rao--Blackwellized estimators, are incorporated into a sampling strategy which mixes well and produces accurate results in real time. The methodology is demonstrated through its application to several data sets originating from a study of early-onset Alzheimer's disease in families of Volga-German ethnic origin.

Keywords: Linkage analysis; joint Gibbs updates; integrated proposals; Metropolis--Hastings restarts; sequential imputation

Full-text: Open access

Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.ss/1081443233
Digital Object Identifier: doi:10.1214/ss/1081443233
Mathematical Reviews number (MathSciNet): MR2059328
Zentralblatt MATH identifier: 1055.62121

References

Baum, L. E. (1972). An inequality and associated maximization technique in statistical estimation for probabilistic functions on Markov processes. In Inequalities III (O. Shisha, ed.) 1--8. Academic Press, New York.
Mathematical Reviews (MathSciNet): MR341782
Baum, L. E., Petrie, T., Soules, G. and Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist. 41 164--171.
Mathematical Reviews (MathSciNet): MR287613
Besag, J., Green, P., Higdon, D. and Mengersen, K. (1995). Bayesian computation and stochastic systems (with discussion). Statist. Sci. 10 3--66.
Mathematical Reviews (MathSciNet): MR1349818
Cannings, C., Thompson, E. A. and Skolnick, M. H. (1978). Probability functions on complex pedigrees. Adv. in Appl. Probab. 10 26--61.
Mathematical Reviews (MathSciNet): MR490038
Digital Object Identifier: doi:10.2307/1426718
Clerget-Darpoux, F., Bonaïti-Pellié, C. and Hochez, J. (1986). Effects of misspecifying genetic parameters in lod score analysis. Biometrics 42 393--399.
Daw, E., Heath, S. C. and Wijsman, E. M. (1999). Multipoint oligogenic analysis of age-of-onset data with applications to Alzheimer's disease pedigrees. Am. J. Hum. Genet. 64 839--851.
Elston, R. C. and Stewart, J. (1971). A general model for the analysis of pedigree data. Human Heredity 21 523--542.
Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 398--409.
Mathematical Reviews (MathSciNet): MR1141740
Geyer, C. J. and Thompson, E. A. (1995). Annealing Markov chain Monte Carlo with applications to ancestral inference. J. Amer. Statist. Assoc. 90 909--920.
Guo, S. W. and Thompson, E. A. (1994). Monte Carlo estimation of mixed models for large complex pedigrees. Biometrics 50 417--432.
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 97--109.
Heath, S. C. (1997). Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. Amer. J. Human Genetics 61 748--760.
Heath, S. C. and Thompson, E. A. (1997). MCMC samplers for multilocus analyses on complex pedigrees. Amer. J. Human Genetics 61 A278.
Hodge, S. E. and Elston, R. C. (1994). Lods, wrods, and mods: The interpretation of lod scores calculated under different models. Genetic Epidemiology 11 329--342.
Irwin, M., Cox, N. and Kong, A. (1994). Sequential imputation for multilocus linkage analysis. Proc. Natl. Acad. Sci. U.S.A. 91 11,684--11,688.
Jensen, C. S., Kjærulff, U. and Kong, A. (1995). Blocking Gibbs sampling in very large probabilistic expert systems. Int. J. Human--Computer Studies 42 647--666.
Jensen, C. S. and Kong, A. (1999). Blocking Gibbs sampling for linkage analysis in large pedigrees with many loops. Amer. J. Human Genetics 65 885--901.
Kong, A., Cox, N., Frigge, M. and Irwin, M. (1993). Sequential imputation and multipoint linkage analysis. Genetic Epidemiology 10 483--488.
Kong, A., Liu, J. S. and Wong, W. H. (1994). Sequential imputations and Bayesian missing data problems. J. Amer. Statist. Assoc. 89 278--288.
Kruglyak, L., Daly, M. J., Reeve-Daly, M. P. and Lander, E. S. (1996). Parametric and nonparametric linkage analysis: A unified multipoint approach. Amer. J. Human Genetics 58 1347--1363.
Lander, E. S. and Green, P. (1987). Construction of multilocus genetic linkage maps in humans. Proc. Natl. Acad. Sci. U.S.A. 84 2363--2367.
Lange, K. and Sobel, E. (1991). A random walk method for computing genetic location scores. Amer. J. Human Genetics 49 1320--1334.
Lee, J. K. and Thomas, D. C. (2000). Performance of Markov chain--Monte Carlo approaches for mapping genes in oligogenic models with an unknown number of loci. Amer. J. Human Genetics 67 1232--1250.
Levy-Lahad, E., Wasco, W., Poorkaj, P., Romano, D. M., Oshima, J., Pettingell, W. H., Yu, C. E., Jondro, P. D., Schmidt, S. D., Wang, K. et al. (1995a). Candidate gene for the Chromosome 1 familial Alzheimer's disease locus. Science 269 973--977.
Levy-Lahad, E., Wijsman, E. M., Nemens, E., Anderson, L., Goddard, K. A., Weber, J. L., Bird, T. D. and Schellenberg, G. D. (1995b). A familial Alzheimer's disease locus on Chromosome 1. Science 269 970--973.
Liang, K.-Y., Rathouz, P. J. and Beaty, T. H. (1996). Determining linkage and mode of inheritance: Mod scores and other methods. Genetic Epidemiology 13 575--593.
Liu, J., Wong, W. H. and Kong, A. (1994). Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika 81 27--40.
Mathematical Reviews (MathSciNet): MR1279653
Zentralblatt MATH: 0811.62080
Digital Object Identifier: doi:10.2307/2337047
Mendel, G. (1866). Experiments in plant hybridisation. (Mendel's original paper in English translation, with a commentary by R. A. Fisher, J. H. Bennett, ed., was published by Oliver and Boyd, Edinburgh, 1965.)
Morton, N. E. (1955). Sequential tests for the detection of linkage. Amer. J. Human Genetics 7 277--318.
O'Connell, J. R. and Weeks, D. E. (1995). The VITESSE algorithm for rapid exact multilocus linkage analysis via genotype set-recoding and fuzzy inheritance. Nature Genetics 11 402--408.
Satagopan, J. M., Yandell, B. S., Newton, M. A. and Osborn, T. C. (1996). A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo. Genetics 144 805--816.
Thomas, A., Gutin, A., Abkevich, V. and Bansal, A. (2000). Multilocus linkage analysis by blocked Gibbs sampling. Statist. Comput. 10 259--269.
Thompson, E. A. (1974). Gene identities and multiple relationships. Biometrics 30 667--680.
Thompson, E. A. (1981). Pedigree analysis of Hodgkin's disease in a Newfoundland genealogy. Ann. Human Genetics 45 279--292.
Thompson, E. A. (1994a). Monte Carlo estimation of multilocus autozygosity probabilities. In Computing Science and Statistics. Proc. 26th Symposium on the Interface (J. Sall and A. Lehman, eds.) 498--506. Interface Foundation of North America, Fairfax Station, VA.
Thompson, E. A. (1994b). Monte Carlo likelihood in genetic mapping. Statist. Sci. 9 355--366.
Thompson, E. A. (2000a). MCMC estimation of multi-locus genome sharing and multipoint gene location scores. Internat. Statist. Rev. 68 53--73.
Thompson, E. A. (2000b). Statistical Inference from Genetic Data on Pedigrees. IMS, Beachwood, OH.
Thompson, E. A. and Guo, S. W. (1991). Evaluation of likelihood ratios for complex genetic models. IMA J. Math. Appl. Med. Biol. 8 149--169.
Thompson, E. A. and Heath, S. C. (1999). Estimation of conditional multilocus gene identity among relatives. In Statistics in Molecular Biology and Genetics (F. Seillier-Moiseiwitsch, ed.) 95--113. IMS, Hayward, CA.
Mathematical Reviews (MathSciNet): MR1842149
Uimari, P. and Hoeschele, I. (1997). Mapping-linked quantitative trait loci using Bayesian analysis and Markov chain Monte Carlo algorithms. Genetics 146 735--743.

2010 © Institute of Mathematical Statistics