- 1,519,640 compound records
- 1,324,941 compounds (of which 1,318,187 have mol files)
- 12,077,491 activities
- 734,201 assays
- 9,356 targets
- 51,277 documents
You can download the data from the ChEMBL FTP site. For more information please read the release notes.
Data changes since the last release:
Drug mechanism of action
For all FDA-approved drugs, information regarding the mechanism of action and associated efficacy targets has been curated from primary sources, such as literature and drug prescribing information. Targets have only been included for a drug if a) the drug is believed to interact directly with the target and b) there is evidence that this interaction contributes towards the efficacy of that drug in the indication(s) for which it is approved.
Structures for around 3200 metal-containing compounds have been removed from the database (though the bioactivity and other information for these compounds is retained). For more information, please see the previous blog posts: http://chembl.blogspot.co.uk/2013/08/removal-of-metal-containing-compounds.html
New data sets
Several new deposited/extracted data sets have also been included in the latest release: two deposited data sets from GlaxoSmithKine for Ghrelin receptor agonists and Motilin receptor agonists, a data set of the results of screening the MMV Malaria Box compound collection for activity against Schistosoma mansoni, two data sets screening the GSK PKIS compound collection for inhibition of luciferase activity, and finally pathology data from the Open TG-GATES project.
Interface changes since the last release:
Browse Drug Targets tab
A new tab has been created to show the new mechanism of action information for FDA approved drugs together with the references from which the information was obtained, and links to the relevant drug/target report card pages.
Document Report Card
A new table has been added to the document report card, showing other ChEMBL documents that are related to the current document. Pair-wise document similarity is assessed by two components. The first component is defined by whether a document cites or is referenced by the other. The second component is defined by the amount of overlap between the compounds and biological targets reported in the two respective documents. This overlap is quantified by the Tanimoto coefficient. Documents with the highest Tanimoto similarity scores to the query document are listed in this section. For example, the following page shows 5 additional ChEMBL documents that are deemed similar to the paper currently being viewed.
Database changes since the last release:
A number of new tables have been added to store the drug mechanism of action information (please see release notes and schema documentation for full details). In addition, a number of minor changes have been made to existing tables:
The PROTEIN_FAMILY_CLASSIFICATION table has been deprecated and replaced by a new hierarchical version: PROTEIN_CLASSIFICATION.
The MOLREGNO field has been removed from the ATC_CLASSIFICATION table and moved to a new mapping table: MOLECULE_ATC_CLASSIFICATION.
The MOLFORMULA field has been moved from the COMPOUND_STRUCTURES table to the COMPOUND_PROPERTIES table (and renamed).
The ChEMBL Team