Kmasker plants – a tool for assessing complex sequence space in plant species

Summary Many plant genomes display high levels of repetitive sequences. The assembly of these complex genomes using short high‐throughput sequence reads is still a challenging task. Underestimation or disregard of repeat complexity in these datasets can easily misguide downstream analysis. Detection...

Full description

Saved in:
Bibliographic Details
Published in:The Plant journal : for cell and molecular biology Vol. 102; no. 3; pp. 631 - 642
Main Authors: Beier, Sebastian, Ulpinnis, Chris, Schwalbe, Markus, Münch, Thomas, Hoffie, Robert, Koeppel, Iris, Hertig, Christian, Budhagatapalli, Nagaveni, Hiekel, Stefan, Pathi, Krishna M., Hensel, Goetz, Grosse, Martin, Chamas, Sindy, Gerasimova, Sophia, Kumlehn, Jochen, Scholz, Uwe, Schmutzer, Thomas
Format: Journal Article
Language:English
Published: England Blackwell Publishing Ltd 01-05-2020
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Summary Many plant genomes display high levels of repetitive sequences. The assembly of these complex genomes using short high‐throughput sequence reads is still a challenging task. Underestimation or disregard of repeat complexity in these datasets can easily misguide downstream analysis. Detection of repetitive regions by k‐mer counting methods has proved to be reliable. Easy‐to‐use applications utilizing k‐mer counting are in high demand, especially in the domain of plants. We present Kmasker plants, a tool that uses k‐mer count information as an assistant throughout the analytical workflow of genome data that is provided as a command‐line and web‐based solution. Beside its core competence to screen and mask repetitive sequences, we have integrated features that enable comparative studies between different cultivars or closely related species and methods that estimate target specificity of guide RNAs for application of site‐directed mutagenesis using Cas9 endonuclease. In addition, we have set up a web service for Kmasker plants that maintains pre‐computed indices for 10 of the economically most important cultivated plants. Source code for Kmasker plants has been made publically available at https://github.com/tschmutzer/kmasker. The web service is accessible at https://kmasker.ipk-gatersleben.de. Significance Statement Whole‐genome shotgun sequencing is producing billions of sequencing reads which can be processed as short sequence words, called k‐mers, that have the power to reveal differences between species or detect sequence regions with striking patterns. Kmasker plants is a tool that utilizes k‐mer count information to perform various applications, allowing sequence data scientists as well as non‐bioinformatics experts to perform k‐mer analysis on whole‐genome shotgun sequencing data for their species of interest.
ISSN:0960-7412
1365-313X
DOI:10.1111/tpj.14645