PADDLE

Predictor of Activation Domains using Deep Learning in Eukaryotes

PADDLE is a deep convolutional neural network that predicts acidic transcriptional activation domains (ADs) from protein sequence.

PADDLE's convolutional neural network architecture

PADDLE can predict both the position and relative strength of acidic ADs. It was trained on high-throughput activation assay data in yeast (S. cerevisiae), but due to the high conservation of acidic AD function across eukaryotes, also predicts activation of human proteins in human cells.

Predicted and experimental activation for tiles across the Arg81 protein

25 ADs predicted by PADDLE were tested in human cells and 23 activated significantly.

See Sanborn et al. eLife (2021) for a full description of the experimental and computational methods.

Code and Data

PADDLE was developed in Python using TensorFlow. To run PADDLE, please see the code on Github. Additionally, activation predictions have been pre-generated on all transcription factors and other nuclear proteins in multiple species, available below. For S. cerevisiae, experimentally measured activation should be used, see the supplemental data from Sanborn et al. eLife (2021).

	Transcription factors	All nuclear proteins
Human	PADDLE prediction values PADDLE prediction plots High-strength predicted ADs Medium-strength predicted ADs	PADDLE prediction values High-strength predicted ADs Medium-strength predicted ADs
Mouse	PADDLE prediction values PADDLE prediction plots High-strength predicted ADs Medium-strength predicted ADs	PADDLE prediction values High-strength predicted ADs Medium-strength predicted ADs
*Drosophila melanogaster*	PADDLE prediction values PADDLE prediction plots High-strength predicted ADs Medium-strength predicted ADs	PADDLE prediction values High-strength predicted ADs Medium-strength predicted ADs
*S. pombe*	PADDLE prediction values PADDLE prediction plots High-strength predicted ADs Medium-strength predicted ADs	PADDLE prediction values High-strength predicted ADs Medium-strength predicted ADs
Zebrafish	PADDLE prediction values PADDLE prediction plots High-strength predicted ADs Medium-strength predicted ADs	PADDLE prediction values High-strength predicted ADs Medium-strength predicted ADs
Rat	PADDLE prediction values PADDLE prediction plots High-strength predicted ADs Medium-strength predicted ADs	PADDLE prediction values High-strength predicted ADs Medium-strength predicted ADs

PADDLE predicts the activation Z-score for any 53 amino acid-long protein sequence. Predicted Z-scores range from approximately -1 to 12: a score of 0 indicates no activation, scores greater than 4 are considered significant, and scores scores greater than 6 are considered strongly significant. PADDLE prediction values are provided as tab-separated values with each row corresponding to one protein, the protein ID in the first column, and predicted activation Z-scores for all 53aa tiles spanning each protein along the row. PADDLE prediction plots displays the scores for each protein by their position in the protein. Protein regions having 5 or more consecutive tiles with score greater than 6 are annotated as high-strength predicted ADs and regions with score greater than 4 are annotated as medium-strength predicted ADs. In these tables, positions given are 1-indexed and inclusive, and the max and mean predicted activation Z-scores across the AD are provided.

Contact Adrian Sanborn (email: a [at] adriansanborn.com) with questions.

PADDLE

Code and Data

Transcription factors

All nuclear proteins

Human

Mouse

Drosophila melanogaster

S. pombe

Zebrafish

Rat