We are pleased to announce the release of ChEMBL 33!
ChEMBL database version ChEMBL 33 release notes
___________________________________________
# This version of the database, prepared on 31/05/2023 contains:
2,399,743 compounds (of which 2,372,674 have mol files)
3,051,613 compound records (non-unique compounds)
20,334,684 activities
1,610,596 assays
15,398 targets
88,630 documents
BioAssay Data Sources: Number Assays: Number Compound Records: Number Activities:
Scientific Literature 1,556,406 1,707,714 8,422,975
Patent Bioactivity Data 16,573 59,839 179,516
Donated Chemical Probes - SGC Frankfurt 10,247 207 70,833
EUbOPEN Chemogenomic Library 9,786 2,488 397,587
BindingDB Database 4,117 137,338 204,256
TP-search Transporter Database 3,592 4,383 6,765
PubChem BioAssays 2,999 531,694 7,434,992
Literature data from EUbOPEN Chemogenomic Library 2,842 709 2,842
FDA Approval Packages 1,386 80 1,387
Sanger Institute Genomics of Drug Sensitivity in Cancer 713 139 73,039
GSK Published Kinase Inhibitor Set 456 1,101 169,451
Kuster lab chemical proteomics drug profiling 325 243 70,505
Drugs for Neglected Diseases Initiative (DNDi) 233 7,070 14,452
MMV Malaria Box 138 8,438 45,158
Curated Drug Pharmacokinetic Data 136 98 1,163
DrugMatrix 134 1,529 494,046
MMV Pathogen Box 88 1,574 6,256
SARS-CoV-2 Screening Data 2020-21 57 26,367 37,209
K4DD Project 48 273 2,064
Gates Library compound collection 37 224,440 1,482,491
CO-ADD antimicrobial screening data 35 24,315 99,793
RESOLUTE - Research Empowerment
on Solute Carriers 34 93 96
Salvensis and LSHTM Schistosomiasis screening data 31 262 1,222
Open Source Malaria Screening 22 211 344
St Jude Malaria Screening 16 1,524 5,456
WHO-TDR Malaria Screening 16 740 5,853
AstraZeneca Deposited Data 15 5,799 11,687
GSK Tuberculosis Screening 15 826 1,814
Deposited Supplementary Bioactivity Data 13 1,786 4,817
GSK Kinetoplastid Screening 13 592 7,235
Curated Drug Metabolism Pathways 11 867 11
MMV Malaria HGL 10 141,662 295,295
HESi 9 31 986
Winzeler Lab Plasmodium Screening Data 7 78,603 399,067
St Jude Leishmania Screening 6 13,643 42,105
GSK Malaria Screening 6 13,533 81,198
Novartis Malaria Screening 6 10,119 27,888
Fraunhofer HDAC6 4 5,632 11,680
Cardiff Schistosomiasis Dataset 2023 4 80 194
Harvard Malaria Screening 4 37 111
IMI-CARE SARS-CoV-2 Data 3 4,404 9,646
Open TG-GATEs 2 160 210,708
Published Kinase Inhibitor Set 2 1 486 491
Compound-Only Data Sources: Number Compound Records:
USP Dictionary of USAN and International Drug Names 12,394
Clinical Candidates 8,619
WHO Anatomical Therapeutic Chemical Classification 3,424
Orange Book 2,272
British National Formulary 1,958
Gene Expression Atlas Compounds 793
Prodrug active ingredients 238
Manually Added Drugs 228
International Nonproprietary Names 227
Withdrawn Drugs 225
HeCaToS Compounds 96
External Project Compounds 10
############################################
# Data changes since the last release:
############################################
# New Sources
"RESOLUTE - Research Empowerment on Solute Carriers" (src_id = 58): this dataset comprises 96 bioactivities measured in 34 assays on 32 SLC targets from the IMI-RESOLUTE project. RESOLUTE (https://re-solute.eu) is an EU-funded consortium working on the solute carrier (SLC) gene family in a public-private partnership. The consortium also develops new transport assays for selected SLCs.
Cardiff Schistosomiasis Dataset 2023 (src_id = 64): A library of 80 compounds were tested in vitro on different life cycle stages of the parasite Schistosoma mansoni. The dataset is also available from the ChEMBL - Neglected Tropical Disease archive (https://chembl.gitbook.io/chembl-ntd/#deposited-set-26-3rd-march-2023-dataset-using-chembl-to-complement-schistosome-drug-discovery).
Literature data from EUbOPEN Chemogenomic Library (src_id = 65): 2,842 bioactivity measurements have been extracted from primary literature by the SGC consortium to complement their Chemogenomic library (src_id = 55). References to primary literature are indicated in the ACTIVITY_PROPERTIES table (TEXT_VALUE AND STANDARD_TEXT_VALUE fields).
# Updated Sources
Scientific Literature
EUbOPEN Chemogenomic Library
# New Deposited Datasets
CHEMBL5096127 - Using ChEMBL to complement schistosome drug discovery
CHEMBL5209563 - FFN206 based assay for SLC18A1 using HEK-293 SLC18A1 OE cells
CHEMBL5209564 - Superclomeleon biosensor based assay for SLC12A3 using HEK-293 SLC12A3 OE cells
CHEMBL5209565 - pH biosensor based assay for SLC16A3 using HEK-293 SLC16A3 OE cells
CHEMBL5209566 - Superclomeleon biosensor-based assay for SLC26A9 using HEK293 SLC26A9 JumpIn OE cells
CHEMBL5209567 - Membrane potential based assay for SLC2A9 using HEK-293 SLC2A9 OE cells
CHEMBL5209568 - Membrane potential based assay for SLC5A11 using HEK-293 SLC5A11 OE cells
CHEMBL5209569 - Membrane potential based assay for SLC6A8 using HEK-293 JumpIN SLC6A8 OE cells
CHEMBL5209570 - Membrane potential based assay for SLC6A12 using HEK-293 SLC6A12 OE cells
CHEMBL5209571 - Membrane potential based assay for SLC13A3 using HEK-293 SLC13A3 OE cells
CHEMBL5209572 - Membrane potential based assay for SLC22A4 using HEK-293 SLC22A4 OE cells
CHEMBL5209573 - Fluo-8 based assay for SLC24A2 using HEK293 SLC24A2 JumpIn OE cells
CHEMBL5209574 - Fluo-8 based assay for SLC24A4 using HEK293 JumpIn SLC24A4 OE cells
CHEMBL5209575 - Membrane potential based assay for SLC1A1 using HEK-293 SLC1A1 OE cells
CHEMBL5209576 - Membrane potential based assay for SLC5A7 using HEK-293 SLC5A7 OE cells
CHEMBL5209577 - Flow cytometry transport assay for SLC2A1 using HEK293 JumpIN TRex SLC2A1 WT-OE cells
CHEMBL5209578 - Flow cytometry transport assay for SLC2A2 using HEK293 JumpIN TRex SLC2A2 WT-OE cells
CHEMBL5209579 - Flow cytometry transport assay for SLC2A4 using HEK293 JumpIN TRex SLC2A4 WT-OE cells
CHEMBL5209580 - Flow cytometry transport assay for SLC2A3 using HEK293 JumpIN TRex SLC2A3 WT-OE cells
CHEMBL5209581 - Membrane potential based assay for SLC6A5 using HEK-293 SLC6A5 OE cells
CHEMBL5209582 - Membrane potential based assay for SLC6A6 using HEK-293 SLC6A6 OE cells
CHEMBL5209583 - pH biosensor based assay for SLC9B2 using HEK-293 SLC9B2 OE cells
CHEMBL5209584 - Membrane potential based assay for SLC15A2 using HEK-293 SLC15A2 OE cells
CHEMBL5209585 - FFN206 based assay for SLC18A2 using HEK-293 SLC18A2 OE cells
CHEMBL5209586 - Membrane potential-based assay for SLC34A1 using HEK293 JumpIn SLC34A1 OE cells
CHEMBL5209587 - Membrane potential based assay for SLC23A1 using HEK-293 SLC23A1 OE cells
CHEMBL5209588 - Membrane potential based assay for SLC6A9 using HEK-293 SLC6A9 OE cells
CHEMBL5209589 - Membrane potential based transport assay for SLC1A3 using HEK293 JumpIn SLC1A3 OE cells
CHEMBL5209590 - Membrane potential based transport assay for SLC7A3 using HEK293 JumpIn SLC7A3 OE cells
CHEMBL5209667 - EUbOPEN Chemical Probe Library 2
CHEMBL5209669 - NanoBRET assay results for EUbOPEN Chemogenomics Library 3
CHEMBL5209684 - Tm Shift (DSF) assay results for EUbOPEN Chemogenomics Library 3
CHEMBL5209801 - GPCR results for EUbOPEN Chemogenomics Library 3
CHEMBL5209897 - Affinity Phenotypic Cellular Literature for EUbOPEN Chemogenomics Library wave 3
CHEMBL5210121 - Affinity On-target Cellular Literature for EUbOPEN Chemogenomics Library wave 3
CHEMBL5210307 - Affinity Biochemical Literature for EUbOPEN Chemogenomics Library wave 3
CHEMBL5212743 - Selectivity Literature for EUbOPEN Chemogenomics Library wave 3
############################################
# Database changes since the last release:
############################################
# New Database Tables:
CHEMBL_RELEASE table: this table links each ChEMBL release (aka version) to its CREATION_DATE.
# New Database Fields:
CHEMBL_RELEASE.CHEMBL_RELEASE_ID (Primary Key; links to DOCS.CHEMBL_RELEASE_ID)
CHEMBL_RELEASE.CHEMBL_RELEASE: ChEMBL release name
CHEMBL_RELEASE.CREATION_DATE: ChEMBL release creation date
DOCS.CHEMBL_RELEASE_ID (Foreign Key; links to CHEMBL_RELEASE.CHEMBL_RELEASE_ID): every document can now be linked via the CHEMBL_RELEASE_ID to the new CHEMBL_RELEASE table, which allows retrieving the CREATION_DATE for each document
ACTIVITIES.ACTION_TYPE (Foreign Key to ACTION_TYPE.ACTION_TYPE):
The ACTION_TYPE field has been added to the ACTIVITIES table and provides additional detail on the mode of action of tested compounds in the specific assay setup. The recorded ACTION_TYPE must match one of the names in the ACTION_TYPE table. This field was populated with mode of action information that had previously been recorded as metadata in the ASSAY_PARAMETERS and ACTIVITY_PROPERTIES tables. In addition, approx. 250 K activities have been manually annotated with an ACTION_TYPE by the ChEMBL data extractors. The initial subset of curated activities are being released as a test set and we encourage feedback. As the rules are being more clearly defined and atypical cases identified, a small number of annotations may change over the coming releases.
MOLECULE_DICTIONARY.NATURAL_PRODUCT: Indicates whether the compound is a natural product as defined by COCONUT (https://coconut.naturalproducts.net/), the COlleCtion of Open Natural ProdUcTs (1 = yes, 0 = default value). Data set retrieved from COCONUT team on 05/05/2023. For the structure mapping, ChEMBL compounds were subjected to stripping off stereochemical information since compound structures in COCONUT did not include stereochemical information when the mapping was performed.
MOLECULE_DICTIONARY.CHEMICAL_PROBE: Indicates whether the compound is a chemical probe as defined by chemicalprobes.org. (1 = yes, 0 = default value). The data set of chemical probes was retrieved from the chemicalprobes.org website and filtered for probes that were assigned an In Vivo Rating or In Cell Rating of 3 stars or more. Data set retrieved on 30/05/2023.
# Data changes and amendments
Missing information in the field DOCS.YEAR was included for 385 patents (src_id = 38).
AUC records (in STANDARD_TYPE field) were all converted to ng.hr.mL-1 units from uM.hr using the parent compound MW. The STANDARD_VALUE and STANDARD_UNITS were updated accordingly. 10,219 records were affected.
Formula: STANDARD_VALUE * MW_FREEBASE
where STANDARD_VALUE is in uM.hr and MW_FREEBASE in g/mol
Cmax records (in STANDARD_TYPE field) were all converted to nM units from ug.mL-1 using the parent compound MW. The STANDARD_VALUE and STANDARD_UNITS were updated accordingly. 14,485 records were affected.
Formula: STANDARD_VALUE / MW_FREEBASE * 10^6
where STANDARD_VALUE is in ug.mL-1 and MW_FREEBASE in g/mol
The tissue annotation was removed from approx. 50,000 legacy assays. In these assays, tissues were not present in the cell-based experiments and the cell source tissue had been incorrectly used to populate the tissue field.
Please note that Oracle 19c dumps will be stopped after ChEMBL 34.
# Funding acknowledgements:
Work contributing to ChEMBL 33 was funded by the Wellcome Trust, EMBL Member States, Open Targets, National Institutes of Health (NIH), EU Innovative Medicines Initiative (IMI) and EU Framework 7 programmes. Please see https://www.ebi.ac.uk/chembl/funding for more details.
# To receive updates when new versions of ChEMBL are available, please sign up to our mailing list: http://listserver.ebi.ac.uk/mailman/listinfo/chembl-announce
# To receive updates about submitting your data to ChEMBL, please sign up to our deposition mailing list: https://listserver.ebi.ac.uk/mailman/listinfo/chembl-depositors
# For general queries/feedback please email: chembl-help@ebi.ac.uk
# For details of upcoming webinars, please see: http://chembl.blogspot.com/search/label/Webinar
Comments