Drug modalities continue to evolve and the capture of data from recent literature ensures that ChEMBL remains up-to-date with recent developments. Drugs and clinical candidates in ChEMBL undergo enhanced curation, including annotation of their drug mechanisms (the therapeutic target with a role in disease efficacy and the action type of the drug against the entity, e.g. INHIBITOR). Classical small-molecule modulators typically bind to the active site of a target, leading to either a loss or, in some cases, an increase in protein function. More recently, emerging strategies such as Targeted Protein Degradation (TPD) have gained traction and instead remove (degrade) the target. The growing body of TPD data in ChEMBL will be discussed in this blog post.
Targeted Protein Degraders (PROTACs) from the ChEMBL database.
To explore TPD data in ChEMBL, we’ve used a combination of TPD-relevant keywords and effector protein accessions (UniProt). If you’re interested in only the extracted dataset then simply scroll down to the link at the bottom of this Blog! To find out more about TPD and our strategy for bioactivity dataset extraction, continue reading!
What is Targeted Protein Degradation (TPD) and how does it work?
TPD is a pharmacological approach whereby a heterobifunctional compound (small molecule or antibody) binds to a therapeutic target protein (protein of interest, or PoI) and directs it to the cellular degradation machinery via an effector protein (effector) that is also bound by the compound. This method can target proteins sometimes classified as "undruggable" - those classes of protein difficult to modulate. Instead of attempting to alter the target’s activity, this approach instead aims to remove the target by degradation.
Compounds functioning in TPD typically comprise three key elements:
- PoI ligand – selectively binds to the protein of interest (PoI).
- Effector ligand – binds to and recruits the degradation machinery (e.g., an E3 ubiquitin ligase in the PROTAC technology).
- Linker region – connects the two ligands.
The degradation machinery that’s recruited depends on the TPD platform, with Proteolysis Targeting Chimeric Molecules (PROTACs) currently the best-established approach. PROTACs are heterobifunctional molecules that bind their disease-causing proteins alongside an E3 ligase effector leading to ubiquitination and subsequent degradation of the target protein (PoI). Another approach, ATTECs (Autophagosome-Tethering Compounds), use a heterobifunctional molecule to target proteins for autophagy by binding the PoI and autophagosome effectors such as LC3. Other TPD strategies include HEMTACs, that use HSP90, RNA-binding RIBOTACs, and other increasingly complex compounds such as TRAFTACs that are emerging as this technology expands.
How to Identify and Extract TPD Data from ChEMBL
As the field advances, TPD-related data in ChEMBL is growing in both quantity and diversity. We’ve explored two approaches to extract TPD data from the ChEMBL database: 1. using accessions for TPD effector proteins such as E3 ligases and 2. mining assays descriptions for TPD-related keywords.
Approaches to mine ChEMBL for TPD data
1. Accession-based approach:
We used a published collection of UniProt accessions for E3 ligases (Liu et al., 2023) and supplemented these with manually curated accessions for other platforms (e.g. HEMTAC, LYTAC). Since targeted protein degraders bind multiple targets (effector and PoI) and place these in proximity, the TARGET-TYPE of targeted protein degraders in ChEMBL is “PROTEIN-PROTEIN INTERACTION”. This filter was applied to enable the selection of effectors within a TPD context.
2. Keyword-based approach:
TPD literature and curated TPD assays in ChEMBL were reviewed to produce a curated list of TPD-specific terms. A RegExp was generated to mine ChEMBL assay descriptions for relevant keywords.
Example: Protac activity at VHL/CRBN in HEK293 cells assessed as pVHL19 protein abundance at 10 nM after 4 hrs by Western blot analysis relative to DMSO (Protac is the keyword in this assay).
Combining the output from the accession and keyword-based methods provided a TPD-enriched dataset whilst keeping noise low. The TPD bioactivity dataset from ChEMBL 36, including assays, activities, targets, and compound structural information, is provided here. The extracted dataset outlines experimental methods for studying degrader-target interactions alongside relevant bioactivity measurements (DC50, Kd, etc.) allowing users to assess degradation efficiency. This dataset can be used as a starting point to access additional required data such as pharmacokinetic records and other associated activities, offering a low-maintenance approach to enrich for TPD data with minimal manual review.
Since our approaches identified TPD-related data at the assay-level, it is expected that a few small molecule inhibitors may be present since these compounds may have been used as controls. Molecular glues were also included in the query, as were degraders using hydrophobic tags (HyT) where the effector is not defined. However, targeted protein modulators (e.g. PhosTAC, ACETAG) are excluded, with the potential exception of certain molecular glues. This is an ongoing project but for those interested in replicating our approach ahead of time, we’ve included a link to some SQL and a curated list of accessions.
Still want more TPD data?
Additional effort has been made to curate legacy TPD data. This curation will be available with our next release (ChEMBL 37). However, please get in touch with us if you’re interested in accessing the intermediate curation ahead of time, or if you would like to discuss ChEMBL’s TPD data in general, or to support further work in this area. You’re also welcome to discuss TPD data with the team at our upcoming UGM.
We hope this inspires further analysis and tool development.
Questions?
Reach out via the Helpdesk for any questions or discuss with the team at our UGM.
Authors: Zainab Ashimiyu-Abdusalam (ChEMBL intern), Emma Manners, and the ChEMBL Team.
Comments