Decoding the Genomic Architecture of Mammalian and Plant Genomes: Synteny Blocks and Large-scale Duplications
Motivation: The existing synteny block reconstruction algorithms use anchors (e.g., orthologous genes) shared over all genomes to construct the synteny blocks for multiple genomes. This approach, while efficient for a few genomes, cannot be scaled to address the need to construct synteny blocks in many mammalian genomes that are currently being sequenced. The problem is that the number of anchors shared among all genomes quickly decreases with the increase in the number of genomes. Another problem is that many genomes (plant genomes in particular) had extensive duplications, which makes decoding of genomic architecture and rearrangement analysis in plants difficult. The existing synteny block generation algorithms in plants do not address the issue of generating non-overlapping synteny blocks suitable for analyzing rearrangements and evolution history of duplications.
Results: In this paper we present a new synteny block generation algorithm based on the A- Bruijn graph framework that overcomes these difficulties. We applied our algorithm to derive non- overlapping synteny blocks in Arabidopsis thaliana. We also generalized this approach to synteny block generation for multiple genomes. The algorithm was applied to human-mouse-rat-dog-chicken genomes and it is able to recover synteny blocks missed by algorithms requiring 5-way anchors.
Permanent link to this document: http://projecteuclid.org/euclid.cis/1268143370
Zentralblatt MATH identifier: 05700374