µProteInS—a proteogenomics pipeline for finding novel bacterial microproteins encoded by small ORFs
Abstract Summary Genome annotation pipelines traditionally exclude open reading frames (ORFs) shorter than 100 codons to avoid false identifications. However, studies have been showing that these may encode functional microproteins with meaningful biological roles. We developed µProteInS, a proteoge...
Saved in:
Published in: | Bioinformatics (Oxford, England) Vol. 38; no. 9; pp. 2612 - 2614 |
---|---|
Main Authors: | , , , , , , , |
Format: | Journal Article |
Language: | English |
Published: |
England
Oxford University Press
28-04-2022
|
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Abstract
Summary
Genome annotation pipelines traditionally exclude open reading frames (ORFs) shorter than 100 codons to avoid false identifications. However, studies have been showing that these may encode functional microproteins with meaningful biological roles. We developed µProteInS, a proteogenomics pipeline that combines genomics, transcriptomics and proteomics to identify novel microproteins in bacteria. Our pipeline employs a model to filter out low confidence spectra, to avoid the need for manually inspecting Mass Spectrometry data. It also overcomes the shortcomings of traditional approaches that usually exclude overlapping genes, leaderless transcripts and non-conserved sequences, characteristics that are common among small ORFs (smORFs) and hamper their identification.
Availability and implementation
µProteInS is implemented in Python 3.8 within an Ubuntu 20.04 environment. It is an open-source software distributed under the GNU General Public License v3, available as a command-line tool. It can be downloaded at https://github.com/Eduardo-vsouza/uproteins and either installed from source or executed as a Docker image.
Supplementary information
Supplementary data are available at Bioinformatics online. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 1367-4803 1367-4811 |
DOI: | 10.1093/bioinformatics/btac115 |