Registered users receive a variety of benefits including the ability to customize email alerts, create favorite journals list, and save searches.
Please note that a Project Euclid web account does not automatically grant access to full-text content. An institutional or society member subscription is required to view non-Open Access content.
Contact email@example.com with any questions.
Motivation: The existing synteny block reconstruction algorithms use anchors (e.g., orthologous genes) shared over all genomes to construct the synteny blocks for multiple genomes. This approach, while efficient for a few genomes, cannot be scaled to address the need to construct synteny blocks in many mammalian genomes that are currently being sequenced. The problem is that the number of anchors shared among all genomes quickly decreases with the increase in the number of genomes. Another problem is that many genomes (plant genomes in particular) had extensive duplications, which makes decoding of genomic architecture and rearrangement analysis in plants difficult. The existing synteny block generation algorithms in plants do not address the issue of generating non-overlapping synteny blocks suitable for analyzing rearrangements and evolution history of duplications.
Results: In this paper we present a new synteny block generation algorithm based on the A- Bruijn graph framework that overcomes these difficulties. We applied our algorithm to derive non- overlapping synteny blocks in Arabidopsis thaliana. We also generalized this approach to synteny block generation for multiple genomes. The algorithm was applied to human-mouse-rat-dog-chicken genomes and it is able to recover synteny blocks missed by algorithms requiring 5-way anchors.
Genome sequencing studies to date have generally sought to assemble consensus genomes by merging sequence contributions from multiple homologous copies of each chromosome. With growing interest in genetic variations, however, there is a need for methods to separate these distinct contributions and assess how individual homologous chromosome copies differ from one another. An approach to this problem was developed using small sequence fragments derived from shotgun sequencing studies to determine the patterns of variations that co-occur on individual chromosomes. This has become known as the "haplotype assembly" problem. This review paper surveys results on the theory and algorithms for haplotype assembly. It first describes common abstractions of the problem. It then discusses some notable intractibility results for different problem variants. It next examines a variety of combinatorial, statistical, and heuristic methods for assembling fragment data sets in practice. The review concludes with a discussion of recent directions in diploid genome sequencing and their implications for haplotype assembly in the future.
Due to the multiple loci control nature of complex phenotypes, there is great interest to test markers simultaneously instead of one by one. In this paper, we compare three model selection methods for genome wide association studies using simulations: the Stochastic Search Variable Selection (SSVS), the Least Absolute Shrinkage and Selection Operator (LASSO) and the Elastic Net. We also apply the three methods to identify genetic variants that are associated with daunorubicin-induced cytotoxicity. The simulation studies were performed by using the genotype data of 60 unrelated individuals from the CEU population in the Hapmap project. For the cytotoxicity data, we used 3,967,790 markers across the whole genome for 56 unrelated individuals from the CEU population. Using Sure Independence Screening as the pre-screening procedure, the SSVS gives a small model while the LASSO gives an intermediate sized model and the Elastic Net provides a large model. The three models share many common markers although the model sizes are different. The model sizes are subject to various cutoffs and parameters. The SSVS outperforms the LASSO and the Elastic Net in simulation studies. We also demonstrate the ability of the SSVS, the LASSO, and the Elastic Net to handle the situation when the number of markers is larger than the number of samples.
While structurally very different, protein and RNA molecules share an important attribute. The motions they undergo are strongly related to the function they perform. For example, many diseases such as Mad Cow disease or Alzheimer’s disease are associated with protein misfolding and aggregation. Similarly, RNA folding velocity may regulate the plasmid copy number, and RNA folding kinetics can regulate gene expression at the translational level. Knowledge of the stability, folding, kinetics and detailed mechanics of the folding process may help provide insight into how proteins and RNAs fold. In this paper, we present an overview of our work with a computational method we have adapted from robotic motion planning to study molecular motions. We have validated against experimental data and have demonstrated that our method can capture biological results such as stochastic folding pathways, population kinetics of various conformations, and relative folding rates. Thus, our method provides both a detailed view (e.g., individual pathways) and a global view (e.g., population kinetics, relative folding rates, and reaction coordinates) of energy landscapes of both proteins and RNAs. We have validated these techniques by showing that we observe the same relative folding rates as shown in experiments for structurally similar protein molecules that exhibit different folding behaviors. Our analysis has also been able to predict the same relative gene expression rate for wild-type MS2 phage RNA and three of its mutants.