Deep profiling of protease substrate specificity enabled by dual random and scanned human proteome substrate phage libraries

Proteolysis is a major posttranslational regulator of biology inside and outside of cells. Broad identification of optimal cleavage sites and natural substrates of proteases is critical for drug discovery and to understand protease biology. Here, we present a method that employs two genetically enco...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the National Academy of Sciences - PNAS Vol. 117; no. 41; pp. 25464 - 25475
Main Authors: Zhou, Jie, Li, Shantao, Leung, Kevin K., O’Donovan, Brian, Zou, James Y., DeRisi, Joseph L., Wells, James A.
Format: Journal Article
Language:English
Published: United States National Academy of Sciences 13-10-2020
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Proteolysis is a major posttranslational regulator of biology inside and outside of cells. Broad identification of optimal cleavage sites and natural substrates of proteases is critical for drug discovery and to understand protease biology. Here, we present a method that employs two genetically encoded substrate phage display libraries coupled with next generation sequencing (SPD-NGS) that allows up to 10,000-fold deeper sequence coverage of the typical six- to eight-residue protease cleavage sites compared to state-of-the-art synthetic peptide libraries or proteomics. We applied SPD-NGS to two classes of proteases, the intracellular caspases, and the ectodomains of the sheddases, ADAMs 10 and 17. The first library (Lib 10AA) allowed us to identify 10⁴ to 10⁵ unique cleavage sites over a 1,000-fold dynamic range of NGS counts and produced consensus and optimal cleavage motifs based position-specific scoring matrices. A second SPD-NGS library (Lib hP), which displayed virtually the entire human proteome tiled in contiguous 49 amino acid sequences with 25 amino acid overlaps, enabled us to identify candidate human proteome sequences. We identified up to 10⁴ natural linear cut sites, depending on the protease, and captured most of the examples previously identified by proteomics and predicted 10- to 100-fold more. Structural bioinformatics was used to facilitate the identification of candidate natural protein substrates. SPD-NGS is rapid, reproducible, simple to perform and analyze, inexpensive, and renewable, with unprecedented depth of coverage for substrate sequences, and is an important tool for protease biologists interested in protease specificity for specific assays and inhibitors and to facilitate identification of natural protein substrates.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Author contributions: J.Z., J.L.D., and J.A.W. designed research; J.Z. performed research; J.Z., B.O., and J.L.D. contributed new reagents/analytic tools; J.Z., S.L., K.K.L., and J.Y.Z. analyzed data; and J.Z. and J.A.W. wrote the paper.
Edited by Benjamin F. Cravatt, Scripps Research Institute, La Jolla, CA, and approved August 19, 2020 (received for review May 11, 2020)
ISSN:0027-8424
1091-6490
DOI:10.1073/pnas.2009279117