Skip to main content

ChEMBL 25 and new web interface released

We are pleased to announce the release of ChEMBL 25 and our new web interface. This version of the database, prepared on 10/12/2018 contains:

  • 2,335,417 compound records
  • 1,879,206 compounds (of which 1,870,461 have mol files)
  • 15,504,603 activities
  • 1,125,387 assays
  • 12,482 targets
  • 72,271 documents


Data can be downloaded from the ChEMBL ftp site: ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_25

Please see ChEMBL_25 release notes for full details of all changes in this release: ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_25/chembl_25_release_notes.txt


DATA CHANGES SINCE THE LAST RELEASE

# Deposited Data Sets:

Kuster Lab Chemical Proteomics Drug Profiling (src_id = 48, Document ChEMBL_ID = CHEMBL3991601):
Data have been included from the publication: The target landscape of clinical kinase drugs. Klaeger S, Heinzlmeir S and Wilhelm M et al (2017), Science, 358-6367 (https://doi.org/10.1126/science.aan4368)

# In Vivo Assay Classification:

A classification scheme has been created for in vivo assays. This is stored in the ASSAY_CLASSIFICATION table in the database schema and consists of a three-level classification. Level 1 corresponds to the top-levels of the ATC classification i.e., anatomical system/therapeutic area (e.g., CARDIOVASCULAR SYSTEM, MUSCULO-SKELETAL SYSTEM, NERVOUS SYSTEM). Level 2 provides a more fine-grained classification of the phenotype or biological process being studied (e.g., Learning and Memory, Anti-Obesity Activity, Gastric Function). Level three represents the specific in vivo assay being performed (e.g., Laser Induced Thrombosis, Hypoxia Tolerance Test in Rats, Paw Edema Test) and is assigned a specific ASSAY_CLASS_ID. Individual in vivo assays in the ChEMBL ASSAYS table are mapped to reference in-vivo assays in the ASSAY_CLASSIFICATION table via the ASSAY_CLASS_MAP table. More information about the classification scheme is available in the following publication: https://doi.org/10.1038/sdata.2018.230. The assay classification is available via web services and will be included in the ChEMBL web interface in the near future.

# Updated Data Sets:
Scientific Literature
Patent Bioactivity Data
BindingDB Database (corrections to compound structures)


WEB INTERFACE/WEB SERVICE CHANGES SINCE THE LAST RELEASE

# Web Interface:

The new ChEMBL web interface is now live at https://www.ebi.ac.uk/chembl (this replaces the previous beta version). The old ChEMBL web interface will be retired before the ChEMBL_26 release, but is available on the following URL until then: https://www.ebi.ac.uk/chembl/old. The new interface provides richer search and filtering capabilities. Documentation regarding this new functionality and frequently asked questions are available on our help pages: https://chembl.gitbook.io/chembl-interface-documentation/

# Changes to Web Services:

The Assay web service has been updated to include both assay_parameters and the in vivo assay classification for an assay (where applicable):
https://www.ebi.ac.uk/chembl/api/data/assay

A separate endpoint has also been created for the in vivo assay classification:
https://www.ebi.ac.uk/chembl/api/data/assay_class

The Activity web service has been updated to include activity_properties. The 'published_type', 'published_relation', 'published_value' and 'published_units' fields have also been renamed to 'type', 'relation', 'value' and 'units':
https://www.ebi.ac.uk/chembl/api/data/activity

A new endpoint has been created to retrieve supplementary data associated with an activity measurement (or list of measurements):
https://www.ebi.ac.uk/chembl/api/data/activity_supplementary_data_by_activity


SCHEMA CHANGES SINCE THE LAST RELEASE

# Tables Added:

ASSAY_CLASSIFICATION:
Classification scheme for phenotypic assays e.g., by therapeutic area, phenotype/process and assay type. Can be used to find standard assays for a particular disease area or phenotype e.g., anti-obesity assays. Currently data are available only for in vivo efficacy assays

COLUMN_NAME DATA_TYPE COMMENT
ASSAY_CLASS_ID NUMBER(9,0) Primary key
L1 VARCHAR2(100) High level classification e.g., by anatomical/therapeutic area
L2 VARCHAR2(100) Mid-level classification e.g., by phenotype/biological process
L3 VARCHAR2(1000) Fine-grained classification e.g., by assay type
CLASS_TYPE VARCHAR2(50) The type of assay being classified e.g., in vivo efficacy
SOURCE VARCHAR2(50) Source from which the assay class was obtained

ASSAY_CLASS_MAP:
Mapping table linking assays to classes in the ASSAY_CLASSIFICATION table

COLUMN_NAME DATA_TYPE COMMENT
ASS_CLS_MAP_ID NUMBER Primary key.
ASSAY_ID NUMBER assay_id is the foreign key that maps to the 'assays' table
ASSAY_CLASS_ID NUMBER assay_class_id is the foreign key that maps to the 'assay_classification' table


# Columns Removed:

ACTIVITIES:
PUBLISHED_TYPE (DEPRECATED in ChEMBL_24, now removed, replaced by TYPE)
PUBLISHED_RELATION (DEPRECATED in ChEMBL_24, now removed, replaced by RELATION)
PUBLISHED_VALUE (DEPRECATED in ChEMBL_24, now removed, replaced by VALUE)
PUBLISHED_UNITS (DEPRECATED in ChEMBL_24, now removed, replaced by UNITS)


Funding Acknowledgements:
Work contributing to ChEMBL_25 was funded by the Wellcome Trust, EMBL Member States, Open Targets, National Institutes of Health (NIH) Common Fund, EU Innovative Medicines Initiative (IMI) and EU Framework 7 programmes. Please see https://www.ebi.ac.uk/chembl/funding for more details.

The ChEMBL Team

Comments

Popular posts from this blog

Here's a nice Christmas gift - ChEMBL 35 is out!

Use your well-deserved Christmas holidays to spend time with your loved ones and explore the new release of ChEMBL 35!            This fresh release comes with a wealth of new data sets and some new data sources as well. Examples include a total of 14 datasets deposited by by the ASAP ( AI-driven Structure-enabled Antiviral Platform) project, a new NTD data se t by Aberystwyth University on anti-schistosome activity, nine new chemical probe data sets, and seven new data sets for the Chemogenomic library of the EUbOPEN project. We also inlcuded a few new fields that do impr ove the provenance and FAIRness of the data we host in ChEMBL:  1) A CONTACT field has been added to the DOCs table which should contain a contact profile of someone willing to be contacted about details of the dataset (ideally an ORCID ID; up to 3 contacts can be provided). 2) In an effort to provide more detailed information about the source of a deposited dat...

Improvements in SureChEMBL's chemistry search and adoption of RDKit

    Dear SureChEMBL users, If you frequently rely on our "chemistry search" feature, today brings great news! We’ve recently implemented a major update that makes your search experience faster than ever. What's New? Last week, we upgraded our structure search engine by aligning it with the core code base used in ChEMBL . This update allows SureChEMBL to leverage our FPSim2 Python package , returning results in approximately one second. The similarity search relies on 256-bit RDKit -calculated ECFP4 fingerprints, and a single instance requires approximately 1 GB of RAM to run. SureChEMBL’s FPSim2 file is not currently available for download, but we are considering generating it periodicaly and have created it once for you to try in Google Colab ! For substructure searches, we now also use an RDKit -based solution via SubstructLibrary , which returns results several times faster than our previous implementation. Additionally, structure search results are now sorted by...

ChEMBL 34 is out!

We are delighted to announce the release of ChEMBL 34, which includes a full update to drug and clinical candidate drug data. This version of the database, prepared on 28/03/2024 contains:         2,431,025 compounds (of which 2,409,270 have mol files)         3,106,257 compound records (non-unique compounds)         20,772,701 activities         1,644,390 assays         15,598 targets         89,892 documents Data can be downloaded from the ChEMBL FTP site:  https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_34/ Please see ChEMBL_34 release notes for full details of all changes in this release:  https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_34/chembl_34_release_notes.txt New Data Sources European Medicines Agency (src_id = 66): European Medicines Agency's data correspond to EMA drugs prior to 20 January 2023 (excluding ...

Improved querying for SureChEMBL

    Dear SureChEMBL users, Earlier this year we ran a survey to identify what you, the users, would like to see next in SureChEMBL. Thank you for offering your feedback! This gave us the opportunity to have some interesting discussions both internally and externally. While we can't publicly reveal precisely our plans for the coming months (everything will be delivered at the right time), we can at least say that improving the compound structure extraction quality is a priority. Unfortunately, the change won't happen overnight as reprocessing 167 millions patents takes a while. However, the good news is that the new generation of optical chemical structure recognition shows good performance, even for patent images! We hope we can share our results with you soon. So in the meantime, what are we doing? You may have noticed a few changes on the SureChEMBL main page. No more "Beta" flag since we consider the system to be stable enough (it does not mean that you will never ...

ChEMBL brings drug bioactivity data to the Protein Data Bank in Europe

In the quest to develop new drugs, understanding the 3D structure of molecules is crucial. Resources like the Protein Data Bank in Europe (PDBe) and the Cambridge Structural Database (CSD) provide these 3D blueprints for many biological molecules. However, researchers also need to know how these molecules interact with their biological target – their bioactivity. ChEMBL is a treasure trove of bioactivity data for countless drug-like molecules. It tells us how strongly a molecule binds to a target, how it affects a biological process, and even how it might be metabolized. But here's the catch: while ChEMBL provides extensive information on a molecule's activity and cross references to other data sources, it doesn't always tell us if a 3D structure is available for a specific drug-target complex. This can be a roadblock for researchers who need that structural information to design effective drugs. Therefore, connecting ChEMBL data with resources like PDBe and CSD is essen...