Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants

FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the or...

Full description

Saved in:

Bibliographic Details
Published in:	Nucleic acids research Vol. 38; no. 6; pp. 1767 - 1771
Main Authors:	Cock, Peter J.A, Fields, Christopher J, Goto, Naohisa, Heuer, Michael L, Rice, Peter M
Format:	Journal Article
Language:	English
Published:	England Oxford University Press 01-04-2010
Subjects:	Computational Biology - history History, 20th Century History, 21st Century Sequence Analysis, DNA - history Sequence Analysis, DNA - standards Software Survey and Summary
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the original Sanger standard, the Solexa/Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS. Being an open access publication, it is hoped that this description, with the example files provided as Supplementary Data, will serve in future as a reference for this important file format.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-3 content type line 23 ObjectType-Review-1 ObjectType-Article-1 ObjectType-Feature-2
ISSN:	0305-1048 1362-4962
DOI:	10.1093/nar/gkp1137