µProteInS—a proteogenomics pipeline for finding novel bacterial microproteins encoded by small ORFs

Abstract Summary Genome annotation pipelines traditionally exclude open reading frames (ORFs) shorter than 100 codons to avoid false identifications. However, studies have been showing that these may encode functional microproteins with meaningful biological roles. We developed µProteInS, a proteoge...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics (Oxford, England) Vol. 38; no. 9; pp. 2612 - 2614
Main Authors: de Souza, Eduardo Vieira, Dalberto, Pedro Ferrari, Machado, Vinicius Pellisoli, Canedo, Adriana, Saghatelian, Alan, Machado, Pablo, Basso, Luiz Augusto, Bizarro, Cristiano Valim
Format: Journal Article
Language:English
Published: England Oxford University Press 28-04-2022
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Summary Genome annotation pipelines traditionally exclude open reading frames (ORFs) shorter than 100 codons to avoid false identifications. However, studies have been showing that these may encode functional microproteins with meaningful biological roles. We developed µProteInS, a proteogenomics pipeline that combines genomics, transcriptomics and proteomics to identify novel microproteins in bacteria. Our pipeline employs a model to filter out low confidence spectra, to avoid the need for manually inspecting Mass Spectrometry data. It also overcomes the shortcomings of traditional approaches that usually exclude overlapping genes, leaderless transcripts and non-conserved sequences, characteristics that are common among small ORFs (smORFs) and hamper their identification. Availability and implementation µProteInS is implemented in Python 3.8 within an Ubuntu 20.04 environment. It is an open-source software distributed under the GNU General Public License v3, available as a command-line tool. It can be downloaded at https://github.com/Eduardo-vsouza/uproteins and either installed from source or executed as a Docker image. Supplementary information Supplementary data are available at Bioinformatics online.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1367-4803
1367-4811
DOI:10.1093/bioinformatics/btac115