A multi-task convolutional deep neural network for variant calling in single molecule sequencing

The accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5–15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional...

Full description

Saved in:
Bibliographic Details
Published in:Nature communications Vol. 10; no. 1; pp. 998 - 11
Main Authors: Luo, Ruibang, Sedlazeck, Fritz J., Lam, Tak-Wah, Schatz, Michael C.
Format: Journal Article
Language:English
Published: London Nature Publishing Group UK 01-03-2019
Nature Publishing Group
Nature Portfolio
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5–15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. For the well-characterized NA12878 human sample, Clairvoyante achieves 99.67, 95.78, 90.53% F1-score on 1KP common variants, and 98.65, 92.57, 87.26% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively. Training on a second human sample shows Clairvoyante is sample agnostic and finds variants in less than 2 h on a standard server. Furthermore, we present 3,135 variants that are missed using Illumina but supported independently by both PacBio and Oxford Nanopore reads. Clairvoyante is available open-source ( https://github.com/aquaskyline/Clairvoyante ), with modules to train, utilize and visualize the model. Single Molecule Sequencing (SMS) technologies generate long but noisy reads data. Here, the authors develop Clairvoyante, a deep neural network-based method for variant calling with SMS reads such as PacBio and ONT data.
ISSN:2041-1723
2041-1723
DOI:10.1038/s41467-019-09025-z