Skip to main content


Showing posts from 2016

Merry Christmas from ChEMBL

Wishing all of our many users and collaborators a very Merry Christmas and a Happy New Year! The ChEMBL Team

A comprehensive map of molecular drug targets

Within the ChEMBL database we spend a lot of time manually curating links between FDA approved drugs and their efficacy targets. With collaborators from the University of New Mexico and the Institute of Cancer Research, we have now published an analysis of these drug efficacy targets: Santos R, Ursu O, Gaulton A, Bento AP, Donadi RS, Bologa CG, Karlsson A, Al-Lazikani B, Hersey A, Oprea TI & Overington JP. A comprehensive map of molecular drug targets Nature Reviews Drug Discovery (2016) doi:10.1038/nrd.2016.230 In the article we address the complexities of assigning drug targets, describe the 667 human proteins and 189 pathogen proteins through which 1,578 FDA-approved drugs act and map each drug to its therapeutic indication via the WHO ATC classification system. We show that 70% of small molecule drugs still act through privileged families (GPCRs, ion channels, kinases and nuclear receptors), highlight the differences in innovation between different therapeutic areas,

New ChEMBL database paper out

The latest ChEMBL database paper is now available online: This paper describes some of the additions to ChEMBL over the last few releases (ChEMBL_18 to ChEMBL_22) such as drug indications and clinical candidates, patent bioactivity data from BindingDB, drug metabolism information and richer assay annotation. A number of papers from our collaborators will also feature in the 2017 NAR database issue, so watch this space...

ChEMBL_22 Data and Web Services Update

ChEMBL_22_1 data update: We would like to inform users that an update to ChEMBL_22 has been released.  The new version, ChEMBL_22_1, corrects an issue with the targets assigned to some BindingDB assays in ChEMBL (src_id = 37). If you are using the BindingDB data from ChEMBL, we recommend you download this update. This update also incorporates the mol file/canonical smiles correction announced previously. Updates have been made to BindingDB data in the ASSAYS, ACTIVITIES, CHEMBL_ID_LOOKUP, LIGAND_EFF and PREDICTED_BINDING_DOMAINS tables. Corrections have also been made to molfiles and canonical_smiles in the COMPOUND_STRUCTURES table. No changes have been made to other data sets or to other drug/compound/target tables in ChEMBL_22. The new release files can be downloaded from: A new version of the ChEMBL RDF is also available from:

ChEMBL 22 release - technical notes

The ChEMBL 22 release brings lots of new data. But we also released some new software so if you are interested in technical details please read on. 1. First of all, please note that ChEMBL 22 is the last release where we provide Oracle 9i dumps . Oracle 9i has been out of support now for at nearly a decade and shouldn't be in use anymore but please let us know if this is a problem. On the other hand, we will do our best to provide Oracle 12c dumps for the next release. 2. If you are using the python API client please upgrade it by running: [sudo] pip install -U chembl_webresource_client This will upgrade the client to the latest version which solves some minor bugs and adds an ability to search in document abstracts. It will also create a new cache so you will see new chembl data immediately. Otherwise, you will need to clear your cache manually. 3. New version ( 2.4.9 ) of the ChEMBL API has been released as well. This version includes:  - new endpoints: tissu

ChEMBL 22 Released

We are pleased to announce the release of ChEMBL 22. This version of the database, prepared on 8th August 2016 contains: 2,043,051 compound records 1,686,695 compounds (of which 1,678,393 have mol files) 14,371,219 activities 1,246,132 assays 11,224 targets 65,213 documents Data can be downloaded from the ChEMBL ftpsite or viewed via the ChEMBL interface . Please see ChEMBL_22 release notes for full details of all changes in this release. CHANGES SINCE THE LAST RELEASE In addition to the regular updates to the Scientific Literature, PubChem, FDA Orange Book and USP Dictionary of USAN and INN Investigational Drug Names this release of ChEMBL also includes the following new data: Deposited Data Sets: Two new deposited data sets have been included in ChEMBL_22: the MMV Pathogen Box compound set ( ) and GSK Tres Cantos Follow-up TB Screening Data ( ). Patent Data from BindingDB: We have wo

ChEMBL_22 is coming soon....

ChEMBL_22 will be released in the next week or two. For those of you who want to plan ahead, here is a preview of the new schema (full documentation here ) We would also like to inform users that we plan to discontinue the Oracle 9i download format after this release. Please  contact us  as soon as possible if you rely on this version.

Join the EMBL-EBI Chemogenomics team!

We are currently seeking multiple talented individuals to join the Chemogenomics team here at EMBL-EBI, both to work on our group resources (ChEMBL, SureChEMBL) and support external projects (FP7 HeCaToS and NIH Illuminating the Druggable Genome). If you are interested in applying for these positions (or for more information) please follow the links below. The closing date for all positions is 12th June. Java Back End Developer: Web Developer: Scientific Programmer: Data Mining and Analysis Scientist: Biological Data Curator:

Target Prediction Models Update

In case you have been too busy to notice, ChEMBL_21 has arrived with the usual additions, improvements and enhancements both on the data/annotation side, as well as on the interface/services. To complement this, we have also updated the target prediction models , which can be downloaded from our ftp  here .  The good news is that, besides the increase in terms of training data (compounds and targets), the new models were built using the latest stable versions of RDKit ( 2015.09.2) and scikit-learn (0.17). The latter was upgraded from the much older 0.14 version, which was causing incompatibility issues (see  MultiLabelBinarizer ) to several of you while trying to use the models. We've also put together a quick Jupyter Notebook demo on how to get predictions from the models here: The new models will also be available on myChEMBL 21 along with a more detailed and elaborate Jupyter

This Python InChI Key resolver will blow your mind

This scientific clickbait title introduces our promised blog post about the integration of UniChem into our ChEMBL python client. UniChem is a very important resource, as it contains information about 134 million (and counting) unique compound structures and cross references between various chemistry resources. Since UniChem is developed in-house and provides its own web services , we thought it would make sense to integrate it with our python client library . Before we present a systematic translation between raw HTTP calls described in the UniChem API documentation and client calls, let us provide some preliminary information: In order to install the client, you should use pip : pip install -U chembl_webresource_client Once you have it installed, you can import the unichem module: from chembl_webresource_client.unichem import unichem_client as unichem OK, so how to resolve an InChI Key to InChI string? It's very simple: Of course in order to reso

ChEMBL 21 web services update

Traditionally, along with the release of the new ChEMBL version , we have made a few updates to our RESTful API . Below you can find a short description of the most important changes:   Data API ( ): 1. New resources: Since ChEMBL 21 introduced a few new tables, we have made them available via the API. The new resources are: drug_indication go_slim metabolism Moreover, the target_component endpoint has been enhanced to provide a list of related GO terms. 2. Solr-based search : a very popular feature request was the ability to search resources by a keyword. A form of searching was already possible before, using filtering terms, such as [i]contains,[i]startswith and [i]endswtith filters. For example, in order to search molecules for 'metazide' in their preferred name, this filter can be used: api/data/molecule?pref_name__icontains=metazide However, this approach has many drawbacks: it's executed on th

ChEMBL DB on SQLite, is that even possible?

Short answer: Yes; Andrew Dalke did it in 2014 for ChEMBL 19 compounds but now it's officially supported by the ChEMBL team and covers the whole database. One thing you can notice looking at the ChEMBL 21 FTP directory is a  new file called chembl_21_sqlite.tar.gz . What's that? It's a binary SQLite database file containing all the ChEMBL 21 tables and data. If you don't know what the SQLite is, it's a very lightweight database system, that stores the entire database (definitions, tables, indices, and the data itself) as a single cross-platform file on a host machine. It's very popular as well, so if you have a Mac, Windows 10 or a Linux box, chances are that SQLite is already installed on your computer. Skype uses SQLite to store the local copy of conversation history and the Python language has SQLite bundled as a core library . If it's so "lightweight", why is the SQLite ChEMBL 21 file 2.4GB, compared to less than 1.4GB for O

ChEMBL 21 Released

We are pleased to announce the release of ChEMBL_21. This version of the database was prepared on 1st February 2016 and contains: • 1,929,473 compound records • 1,592,191 compounds (of which 1,583,897 have mol files) • 13,968,617 activities • 1,212,831 assays • 11,019 targets • 62,502 source documents Data can be downloaded from the ChEMBL ftpsite  or viewed via the ChEMBL interface . Please see  ChEMBL_21 release notes  for full details of all changes in this release. CHANGES SINCE THE LAST RELEASE In addition to the regular updates to the Scientific Literature, PubChem, FDA Orange Book and USP Dictionary of USAN and INN Investigational Drug Names this release of ChEMBL also includes the following new data: * Data Depositions Eight new deposited data sets have been included in ChEMBL_21. These include HepG2 cell viability data for the Gates Library Compound Collection from the University of Dundee, three depositions from groups screening the

Forthcoming Conferences

There are a number of Conferences and meetings coming up in the next few weeks that might be of interest: Firstly, it's not too late to register for the KNIME Spring Summit in Berlin 24th -26th February More details  here 
The next SME Forum will be held on the Wellcome Genome Campus at Hinxton near Cambridge on 7th and 8th March.  Come and find out more about EMBL-EBI's  freely available data resources including ChEMBL and SureChEMBL. More details on the meeting and registration  here UKQSAR and Physchem Forum Joint Symposium This is a two day meeting being held on 15th to 16th March at Stevenage in the UK.  There are a limited number of places still available and you must register (by 29th Feb) if you want to attend.  More details can be found  here .     Last but not least consider going to the Spring ACS meeting in San Diego 13th to 17th March where there will be a couple of ChEMBL talks and if a