PADDLE
Predictor of Activation Domains using Deep Learning in Eukaryotes
PADDLE is a deep convolutional neural network that predicts acidic transcriptional activation domains (ADs) from protein sequence.

PADDLE can predict both the position and relative strength of acidic ADs. It was trained on high-throughput activation assay data in yeast (S. cerevisiae), but due to the high conservation of acidic AD function across eukaryotes, also predicts activation of human proteins in human cells.


See Sanborn et al. eLife (2021) for a full description of the experimental and computational methods.
Code and Data
PADDLE was developed in Python using TensorFlow. To run PADDLE, please see the code on Github. Additionally, activation predictions have been pre-generated on all transcription factors and other nuclear proteins in multiple species, available below. For S. cerevisiae, experimentally measured activation should be used, see the supplemental data from Sanborn et al. eLife (2021).
PADDLE predicts the activation Z-score for any 53 amino acid-long protein sequence. Predicted Z-scores range from approximately -1 to 12: a score of 0 indicates no activation, scores greater than 4 are considered significant, and scores scores greater than 6 are considered strongly significant. PADDLE prediction values are provided as tab-separated values with each row corresponding to one protein, the protein ID in the first column, and predicted activation Z-scores for all 53aa tiles spanning each protein along the row. PADDLE prediction plots displays the scores for each protein by their position in the protein. Protein regions having 5 or more consecutive tiles with score greater than 6 are annotated as high-strength predicted ADs and regions with score greater than 4 are annotated as medium-strength predicted ADs. In these tables, positions given are 1-indexed and inclusive, and the max and mean predicted activation Z-scores across the AD are provided.
Contact Adrian Sanborn (email: a [at] adriansanborn.com) with questions.