PhyKIT: a broadly applicable UNIX shell toolkit for processing and analyzing phylogenomic data

Abstract Motivation Diverse disciplines in biology process and analyze multiple sequence alignments (MSAs) and phylogenetic trees to evaluate their information content, infer evolutionary events and processes and predict gene function. However, automated processing of MSAs and trees remains a challe...

Full description

Saved in:

Bibliographic Details
Published in:	Bioinformatics (Oxford, England) Vol. 37; no. 16; pp. 2325 - 2331
Main Authors:	Steenwyk, Jacob L, Buida, Thomas J, Labella, Abigail L, Li, Yuanning, Shen, Xing-Xing, Rokas, Antonis
Format:	Journal Article
Language:	English
Published:	England Oxford University Press 25-08-2021 Oxford Publishing Limited (England)
Subjects:	Availability Bias Bioinformatics Biological evolution Information processing Mutation rates Original Papers Phylogenetics Phylogeny Toolkits UNIX
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Abstract Motivation Diverse disciplines in biology process and analyze multiple sequence alignments (MSAs) and phylogenetic trees to evaluate their information content, infer evolutionary events and processes and predict gene function. However, automated processing of MSAs and trees remains a challenge due to the lack of a unified toolkit. To fill this gap, we introduce PhyKIT, a toolkit for the UNIX shell environment with 30 functions that process MSAs and trees, including but not limited to estimation of mutation rate, evaluation of sequence composition biases, calculation of the degree of violation of a molecular clock and collapsing bipartitions (internal branches) with low support. Results To demonstrate the utility of PhyKIT, we detail three use cases: (1) summarizing information content in MSAs and phylogenetic trees for diagnosing potential biases in sequence or tree data; (2) evaluating gene–gene covariation of evolutionary rates to identify functional relationships, including novel ones, among genes and (3) identify lack of resolution events or polytomies in phylogenetic trees, which are suggestive of rapid radiation events or lack of data. We anticipate PhyKIT will be useful for processing, examining and deriving biological meaning from increasingly large phylogenomic datasets. Availability and implementation PhyKIT is freely available on GitHub (https://github.com/JLSteenwyk/PhyKIT), PyPi (https://pypi.org/project/phykit/) and the Anaconda Cloud (https://anaconda.org/JLSteenwyk/phykit) under the MIT license with extensive documentation and user tutorials (https://jlsteenwyk.com/PhyKIT). Supplementary information Supplementary data are available at Bioinformatics online.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1367-4803 1367-4811
DOI:	10.1093/bioinformatics/btab096