ChEMBL 37 arrives with nearly three million compounds, a new modality field for capturing targeted protein degraders, and a wave of data quality improvements.
A new field for targeted protein degradation
Around 29,000 bioactivity data points have been annotated with ACTIVITIES.MODALITY = “Targeted Protein Degradation”, including around 3,000 legacy bioactivities from approximately 1,200 remapped assays. This marks ChEMBL’s first systematic effort to flag TPD data across the database.
Extensive update of literature data
Scientific literature remains the backbone of ChEMBL, and ChEMBL 37 brings one of the more substantial literature updates in recent releases. Compared to ChEMBL 36, the literature source gained approximately 79,000 new assays, 270,000 new bioactivity data points, and 52,000 new compound records — pushing the total literature-derived activity count past 9.55 million.
Twelve new chemical probe datasets
ChEMBL 37 adds twelve new datasets for the Donated Chemical Probes (DCP) source. The chemical probe flag (MOLECULE_DICTIONARY.CHEMICAL_PROBE) has also been refreshed using the latest data from chemicalprobes.org and probes-drug.org, extracted in April 2026.
Chemical structure curation updates
• Approximately 400 tautomer duplicates that escaped InChI detection have been merged using a SMILES tautomer hash, meaningfully improving compound uniqueness.
• Over 3,000 legacy BindingDB structures have been manually curated, addressing a backlog of structural inconsistencies.
Organism targets and assay annotations
Legacy organism-based assays — predominantly antimicrobial data — have received two waves of curation. First, approximately 400 new organism entries have been added to the TARGET_DICTIONARY, following remapping aligned with the NCBI taxonomy. Second, around 7,300 assays have been updated with ASSAY_STRAIN information that was present in assay descriptions but missing from the structured field.
Preferred names for tissue targets have also been harmonised between TARGET_DICTIONARY and TISSUE_DICTIONARY using UBERON ontology names.
Schema changes to be aware of
Two tables and four fields have been deprecated since ChEMBL 36. Queries or pipelines that reference these items will need to be updated.
|
Type |
Deprecated Item |
|
Table |
CURATION_LOOKUP |
|
Table |
PROTEIN_CLASS_SYNONYMS |
|
Field |
CELL_DICTIONARY.CL_LINCS_ID |
|
Field |
ASSAYS.CURATED_BY |
ChEMBL-RDF has also been updated to use the latest version of UniChem. Available sources in the RDF export have changed accordingly.
The full release notes can be found here.
Comments