Building the Coleoptera tree‐of‐life for >8000 species: composition of public DNA data and fit with Linnaean classification

The species representation of public databases is growing rapidly and permits increasingly detailed phylogenetic inferences. We present a supermatrix based on all gene sequences of Coleoptera available in Genbank for two nuclear (18S and 28S rRNA) and two mitochondrial (rrnL and cox1) genes. After f...

Full description

Saved in:
Bibliographic Details
Published in:Systematic entomology Vol. 39; no. 1; pp. 97 - 110
Main Authors: BOCAK, LADISLAV, BARTON, CHRISTOPHER, CRAMPTON‐PLATT, ALEX, CHESTERS, DOUGLAS, AHRENS, DIRK, VOGLER, ALFRIED P
Format: Journal Article
Language:English
Published: Oxford, UK Blackwell Publishing Ltd 2014
Wiley Subscription Services, Inc
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The species representation of public databases is growing rapidly and permits increasingly detailed phylogenetic inferences. We present a supermatrix based on all gene sequences of Coleoptera available in Genbank for two nuclear (18S and 28S rRNA) and two mitochondrial (rrnL and cox1) genes. After filtering for unique species names and the addition of ˜2000 unpublished sequences for cox1 and 18S rRNA, the resulting data matrix included 8441 species‐level terminals and 6600 aligned nucleotide positions. The concatenated matrix represents the equivalent of 2.17% of the 390 000 described species of Coleoptera and includes 152 beetle families. The remaining 29 families constitute small lineages with ˜250 known species in total. Taxonomic coverage remains low for several major lineages, including Buprestidae (0.16% of described species), Staphylinidae (1.03%), Tenebrionidae (0.90%) and Cerambycidae (0.58%). The current taxon sampling was strongly biased towards the Northern Hemisphere. Phylogenetic trees obtained from the supermatrix were in very good agreement with the Linnaean classification, in particular at the family level, but lower for the subfamily and lowest for the genus level. The topology supports the basal split of Derodontidae and Scirtoidea from the remaining Polyphaga, and the broad paraphyly of Cucujoidea. The data extraction pipeline and detailed tree provide a framework for placement of any new sequences, including environmental samples, into a DNA‐based classification system of Coleoptera.
Bibliography:http://dx.doi.org/10.1111/syen.12037
ArticleID:SYEN12037
Science Foundation of the Czech Republic - No. P506/11/1757
istex:9863BF2D1AD225AE4F99E1351C1EC0B1021D119B
ark:/67375/WNG-RRXKH116-T
German Science Foundation - No. DFG-AH175/1-2
NERC CASE studentship - No. NER/S/A/2006/14013
Leverhulme Trust - No. F/00696/P
Table S1. Comparison of beetle classifications.Table S2. Overview of the Genbank coverage.Table S3. Overview of beetle families represented in GenBank for five fragments used for phylogenetic inference, tRIs indexes for superfamilies and families and recovery of clades in the tree illustrated in Fig. and Figure S1.Table S4. Geographic origin of Coleoptera sequences deposited in Genbank (Data in bold refer to the regions shown in Fig. ).Table S5. Taxonomic indexes of inferred trees and an overview of misplaced taxa in five inferred trees, with GenBank accession numbers, numbers of trees where misplaced and number of fragments, for which misplaced taxon was sequenced.Table S6. Taxon codes of taxa and GenBank accession numbers for terminals in Figure S1.Figure S1. The full-resolution tree inferred from the BlastAlign of five loci and 8441 taxa by maximum likelihood analysis.Figure S2. The numbers of sequenced and known species from the loci included in the analysis.Figure S3. Number of taxa represented by various loci combinations.Figure S4. The tRI values and proportion of sequenced species (a) for superfamilies and (b) for families with more than 20 samples in the final dataset.Figure S5. The percentage of misplaced taxa and the percentage of length invariable loci for samples represented by one to four loci in the final dataset.
ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:0307-6970
1365-3113
DOI:10.1111/syen.12037