Skip to main content Skip to secondary navigation
Main content start


PADDLE

Predictor of Activation Domains using Deep Learning in Eukaryotes

PADDLE is a deep convolutional neural network that predicts acidic transcriptional activation domains (ADs) from protein sequence.

PADDLE's convolutional neural network architecture

PADDLE can predict both the position and relative strength of acidic ADs. It was trained on high-throughput activation assay data in yeast (S. cerevisiae), but due to the high conservation of acidic AD function across eukaryotes, also predicts activation of human proteins in human cells.

Predicted and experimental activation for tiles across the Arg81 protein
25 ADs predicted by PADDLE were tested in human cells and 23 activated significantly.

See Sanborn et al. eLife (2021) for a full description of the experimental and computational methods.

Code and Data

PADDLE was developed in Python using TensorFlow. To run PADDLE, please see the code on Github. Additionally, activation predictions have been pre-generated on all transcription factors and other nuclear proteins in multiple species, available below. For S. cerevisiae, experimentally measured activation should be used, see the supplemental data from Sanborn et al. eLife (2021).

 

Transcription factors

All nuclear proteins

Human

PADDLE prediction values
PADDLE prediction plots
High-strength predicted ADs
Medium-strength predicted ADs
PADDLE prediction values
High-strength predicted ADs
Medium-strength predicted ADs

Mouse

PADDLE prediction values
PADDLE prediction plots
High-strength predicted ADs
Medium-strength predicted ADs
PADDLE prediction values
High-strength predicted ADs
Medium-strength predicted ADs

Drosophila melanogaster

PADDLE prediction values
PADDLE prediction plots
High-strength predicted ADs
Medium-strength predicted ADs
PADDLE prediction values
High-strength predicted ADs
Medium-strength predicted ADs

S. pombe

PADDLE prediction values
PADDLE prediction plots
High-strength predicted ADs
Medium-strength predicted ADs
PADDLE prediction values
High-strength predicted ADs
Medium-strength predicted ADs

Zebrafish

PADDLE prediction values
PADDLE prediction plots
High-strength predicted ADs
Medium-strength predicted ADs
PADDLE prediction values
High-strength predicted ADs
Medium-strength predicted ADs

Rat

PADDLE prediction values
PADDLE prediction plots
High-strength predicted ADs
Medium-strength predicted ADs
PADDLE prediction values
High-strength predicted ADs
Medium-strength predicted ADs

PADDLE predicts the activation Z-score for any 53 amino acid-long protein sequence. Predicted Z-scores range from approximately -1 to 12: a score of 0 indicates no activation, scores greater than 4 are considered significant, and scores scores greater than 6 are considered strongly significant. PADDLE prediction values are provided as tab-separated values with each row corresponding to one protein, the protein ID in the first column, and predicted activation Z-scores for all 53aa tiles spanning each protein along the row. PADDLE prediction plots displays the scores for each protein by their position in the protein. Protein regions having 5 or more consecutive tiles with score greater than 6 are annotated as high-strength predicted ADs and regions with score greater than 4 are annotated as medium-strength predicted ADs. In these tables, positions given are 1-indexed and inclusive, and the max and mean predicted activation Z-scores across the AD are provided.

Contact Adrian Sanborn (email: a [at] adriansanborn.com) with questions.