Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies

Genomic data have become major resources to understand complex mechanisms at fine-scale temporal and spatial resolution in functional and evolutionary genetic studies, including human diseases, such as cancers. Recently, a large number of whole genomes of evolving populations of yeast (Saccharomyces...

Full description

Saved in:

Bibliographic Details
Published in:	PloS one Vol. 14; no. 8; p. e0221858
Main Authors:	Song, Giltae, Lee, Jongin, Kim, Juyeon, Kang, Seokwoo, Lee, Hoyong, Kwon, Daehong, Lee, Daehwan, Lang, Gregory I, Cherry, J Michael, Kim, Jaebum
Format:	Journal Article
Language:	English
Published:	United States Public Library of Science 27-08-2019 Public Library of Science (PLoS)
Subjects:	Assemblies Assembly Baking yeast Bioinformatics Biological evolution Biology and Life Sciences Chromosomes Chromosomes, Fungal Computational neuroscience Computer science Datasets DNA sequencing Engineering Evolution Gene sequencing Genetic aspects Genetics Genome, Fungal Genomes Genomics Laboratories Medical research Methods Molecular Sequence Annotation Neurospora Nucleotide sequence Perl Physiological aspects Pipelines Research and Analysis Methods Saccharomyces cerevisiae Sequence Analysis, DNA Software Spatial discrimination Spatial resolution Synteny - genetics Time dependence Yeast Yeasts South Korea United States > US
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Genomic data have become major resources to understand complex mechanisms at fine-scale temporal and spatial resolution in functional and evolutionary genetic studies, including human diseases, such as cancers. Recently, a large number of whole genomes of evolving populations of yeast (Saccharomyces cerevisiae W303 strain) were sequenced in a time-dependent manner to identify temporal evolutionary patterns. For this type of study, a chromosome-level sequence assembly of the strain or population at time zero is required to compare with the genomes derived later. However, there is no fully automated computational approach in experimental evolution studies to establish the chromosome-level genome assembly using unique features of sequencing data. In this study, we developed a new software pipeline, the integrative meta-assembly pipeline (IMAP), to build chromosome-level genome sequence assemblies by generating and combining multiple initial assemblies using three de novo assemblers from short-read sequencing data. We significantly improved the continuity and accuracy of the genome assembly using a large collection of sequencing data and hybrid assembly approaches. We validated our pipeline by generating chromosome-level assemblies of yeast strains W303 and SK1, and compared our results with assemblies built using long-read sequencing and various assembly evaluation metrics. We also constructed chromosome-level sequence assemblies of S. cerevisiae strain Sigma1278b, and three commonly used fungal strains: Aspergillus nidulans A713, Neurospora crassa 73, and Thielavia terrestris CBS 492.74, for which long-read sequencing data are not yet available. Finally, we examined the effect of IMAP parameters, such as reference and resolution, on the quality of the final assembly of the yeast strains W303 and SK1. We developed a cost-effective pipeline to generate chromosome-level sequence assemblies using only short-read sequencing data. Our pipeline combines the strengths of reference-guided and meta-assembly approaches. Our pipeline is available online at http://github.com/jkimlab/IMAP including a Docker image, as well as a Perl script, to help users install the IMAP package, including several prerequisite programs. Users can use IMAP to easily build the chromosome-level assembly for the genome of their interest.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Competing Interests: The authors have declared that no competing interests exist.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0221858