ChEMBL Resources

The SARfaris: GPCR, Kinase, ADME

Friday, 19 May 2017

ChEMBL_23 released

We are pleased to announce the release of ChEMBL_23. This release was prepared on 1st May 2017 and contains:

* 2,101,843 compound records
* 1,735,442 compounds (of which 1,727,112 have mol files)
* 14,675,320 activities
* 1,302,147 assays
* 11,538 targets
* 67,722 source documents

Data can be downloaded from the ChEMBL ftp site:

Please see ChEMBL_23 release notes for full details of all changes in this release:


In addition to the regular updates to the Scientific Literature, FDA Orange Book and USP Dictionary of USAN and INN Investigational Drug Names and Clinical Candidates, this release of ChEMBL also includes the following new data:

Patent Bioactivity Data
With funding from the NIH Illuminating the Druggable Genome project (, we have extracted bioactivity data relating to understudied druggable targets from a number of patent documents and added this data to ChEMBL.  

Curated Drug Pharmacokinetic Data
We have manually extracted pharmacokinetic parameters for approved drugs from DailyMed drug labels. 

Drug information from British National Formulary and ATC classification
We have now included compound records for drugs that are in the WHO ATC classification or the British National Formulary (BNF). Currently only BNF drugs that already exist in ChEMBL have been assigned compound records. In future releases we will add new BNF drugs to ChEMBL.

Deposited Data Sets
CO-ADD, The Community for Open Antimicrobial Drug Discovery (, is a global open-access screening initiative launched in February 2015 to uncover significant and rich chemical diversity held outside of corporate screening collections. CO-ADD provides unencumbered free antimicrobial screening for any interested academic researcher.  CO-ADD has been recognised as a novel approach in the fight against superbugs by the Wellcome Trust, who have provided funding through their Strategic Awards initiative. Open Source Malaria (OSM) is aimed at finding new medicines for malaria using open source drug discovery, where all data and ideas are freely shared, there are no barriers to participation, and no restriction by patents. The initial set of deposited data from the CO-ADD project consists of OSM compounds screened in CO-ADD assays (DOI = 10.6019/CHEMBL3832881).

Modelled on the Malaria Box, the MMV Pathogen Box contains 400 diverse, drug-like molecules active against neglected diseases of interest and is available free of charge ( The Pathogen Box compounds are supplied in 96-well plates, containing 10​uL of a 10mM dimethyl sulfoxide (DMSO) solution of each compound. Upon request, researchers around the world will receive a Pathogen Box of molecules to help catalyse neglected disease drug discovery. In return, researchers are asked to share any data generated in the public domain within 2 years, creating an open and collaborative forum for neglected diseases drug research. The initial set of assay data provided by MMV has now been included in ChEMBL (DOI = 10.6019/CHEMBL3832761).


Schema changes will be made in ChEMBL_24 to accommodate more complex data types. Details of these changes will be released soon. Please follow the ChEMBL blog or sign up to the ChEMBL announce mailing list for details (

Changes will also be made in ChEMBL_24 to the way some of the physicochemical properties are calculated. Details of these changes will be announced soon.

Funding acknowledgements:

Work contributing to ChEMBL_23 was funded by the Wellcome Trust, EMBL Member States, Open Targets, National Institutes of Health (NIH) Common Fund, EU Innovative Medicines Initiative (IMI) and EU Framework 7 programmes. Please see for more details.

The ChEMBL Team

If you require further information about ChEMBL, please contact us:

# To receive updates when new versions of ChEMBL are available, please sign up to our mailing list:
# For general queries/feedback please email:
# To report any problems with data content please email:
# For details of upcoming webinars, please see:

Wednesday, 5 April 2017

Technical internships at ChEMBL

Technical internships at ChEMBL.


We are looking for skilled Computer Science (and related fields) students with strong programming skills to join our team for 3-6 month internships. This is not necessarily a summer internship program, you can start whenever convenient for you after being accepted. Please take a look at some of the research ideas / candidate profiles below:

1. Java programmer -  we are looking for a person with experience in Java to develop a prototype of new KNIME nodes for interacting with the ChEMBL API. Experience with REST and/or KNIME is a plus but not a requirement - you can learn it during your internship. A very important thing to note that you should be excited about UX and creating user-friendly and pragmatic GUIs.

2. C++ programmer - we would like to invite a person passionate about C++ and pattern recognition / image processing to experiment with optimising the open-source OSRA code. OSRA is like OCR but for molecules. We want to make it faster and more accurate.

3. C++ programmer with a graph theory knowledge. Chemical compounds are represented as graphs in-silico. We want to be able to quickly generate random graphs that would also be valid compounds. Experience with distributed computing, computing grids, network file systems and map-reduce is a plus but not required.

4. JavaScript programmer - "any application that can be written in JavaScript, will eventually be written in JavaScript". This is why we are looking for a person with JS experience to experiment with:
  • Creating prototypes of reusable chemical web widgets using polymer.
  • Using emscripten to cross compile some core chemical software written in C++ to JS.
5. A person with a data visualisation skills to explore Kibana and Kibi tools to create beautiful and informative datavis widgets from ChEMBL data.

6. Someone with the Natural Language Processing background to:
  • Create a dictionary of common spelling mistakes in chemistry patents.
  • Create a network of patent relations using textrank algorithm.
  • Explore different approaches to the Named Entity Classification problem.

How to apply?

Just send your CV to kholmes @ with 'ChEMBL Tech Internships' subject.

When to apply?

You can apply anytime but we will only contact selected candidates.

Will all those internships start at the same time?

No, in fact we are planning to select max. 2 most interesting candidates at a given time.

Will I get paid?

The internship is paid 800 GBP per month OR funded by your alma mater (whatever is better for you).

Sunday, 19 March 2017

Finding Compounds in Databases using UniChem

Have you ever identified an interesting compound and wondered what else is known about it?  For example is there any bioactivity data on it in ChEMBL or PubChem?  Is there any toxicity data on it (CompTox)?  Then having found interesting data on a compound wondered if it can be purchased or whether it has been patented.  All this can be done using UniChem.  Interested?  

Come along to our webinar on 29th March at 2pm BST (3pm CEST, 9am EDT)
You will however need to register by emailing chembl-help. Places are limited so please let us know as soon as possible if you register but are then unable to attend.

If you want to know more about UniChem please read on.

UniChem (  is a simple system we have developed to cross-reference compounds across databases both internal to EMBL-EBI and externally. Currently we have cross-references to 140 million compounds in 30 different databases. Information about the sources indexed in UniChem can be found here. UniChem is updated weekly with new compounds from these source databases.

So, for example, you can input a database identifier or an InChIKey into UniChem and see links to all the other indexed databases that have information about that compound.

If we take the drug paroxetine and search for it in UniChem, it is found in 22 databases and the UniChem webpage gives links to the paroxetine entries in those databases.

You don’t have to do this compound by compound using the web interface though.  UniChem has a comprehensive set of  web services that you can use to retrieve data or alternatively all the database files and source to source mapping files are available for download.

UniChem relies on the InChIKey to do the mapping between databases and this works fine if two databases have exactly the same structure for a compound.  We all know however that this isn’t always the case.  Sometimes a different salt or isotope was tested or a mistake was made in the stereocentre assignment meaning the InChIKeys no longer match.

However don’t despair.  UniChem connectivity searching can help.  It turns out that because of the clever way that the InChI is built up with layers, this can be deconstructed and mapping can be done such that the relationship between compounds that differ by stereochemistry, isotopes, protonation state etc can all be identified and mapped. You can do this on single components or mixtures.

Taking our paroxetine example:

We have paroxetine and a number of related compounds in ChEMBL. For example:
Maybe someone wanted to genuinely test these related compounds or maybe they are errors (or a mixture of both).  Whatever the reason by using the UniChem connectivity searching feature we can identify any compounds that match paroxetine on the InChI connectivity layer.
The matches identified from a connectivity search starting with paroxetine can be found here:

At the webinar on 29th March we will describe how this is done in more detail and discuss some use cases.  If you are interested don’t forget to register.

If you want to read more here are links to two papers about UniChem:
Chambers, J., Davies, M., Gaulton, A., Hersey, A., Velankar, S., Petryszak, R., Hastings, J., Bellis, L., McGlinchey, S. and Overington, J.P. 
UniChem: A Unified Chemical Structure Cross-Referencing and Identifier Tracking System.
Journal of Cheminformatics2013, 5:3 (January 2013).

Chambers, J., Davies, M., Gaulton, A., Papadatos, G., Hersey and Overington, J.P.
UniChem: extension of InChI-based compound mapping to salt, connectivity and stereochemistry layers.
Journal of Cheminformatics2014, 6:43 (September 2014)

Tuesday, 14 March 2017

Chemogenomics Analyst Wanted

We are looking to recruit a scientist to support our work for the Horizon 2020 project “Coordinated Research Infrastructures Building Enduring Life-science services” (CORBEL). The role is to facilitate scientists in their use of chemogenomics resources by enabling database searching and evaluation of data.
  • To be responsible for liaising with scientists engaged in CORBEL and advising on the use of chemogenomics resources to progress their projects;
  • To help in the identification and analysis of bioactivity data from multiple database resources;
  • To construct and utilize appropriate workflows to facilitate the pharmacological profiling of molecules and chemotypes, the identification of potential off-target effects and the development of target prediction models;
  • To identify interoperability gaps between resources and help with developing solutions;
  • To organize and run appropriate training courses for scientists engaged in the CORBEL project;

 For full details of the position, or to apply see:

The closing date is 9th April 2017

Monday, 27 February 2017

Position to work on tractability in Open Targets

There is currently an opening for a Protein Computational Scientist to work on methods to assess and quantify the tractability (druggability) of potential new targets for drug discovery. This is a two year position funded by the Open Targets initiative.

The appointee will work with scientists from the Open Targets partners to assess, validate and develop methods for quantifying target tractability with the ultimate goal of incorporating such methodologies into the target validation platform ( The initial focus will be on “small molecule” tractability but we are also interested in other modalities in due course (e.g. antibody therapies). Many of the current methods to assess small molecule tractability are based on the use of 3D protein structures, but such information is only available for a subset of potential targets; a key component of the project is to determine robust methods and pipelines that can be applied to novel targets where there is much more limited information.

For more details or to apply, click here

Closing date is 9th March

(the image above is taken from the Fpocket publication:

Thursday, 9 February 2017

ChEMBL Webinars

We will be running a new series of webinars over the next few months. These will cover a range of topics including basic introductions to the Chemogenomics resources (ChEMBL, SureChEMBL, UniChem) as well as more detailed topics, a schema walkthrough and ChEMBL web services.

The first webinar will be a basic introduction to ChEMBL and will be on 22nd February at 2pm GMT (3pm CET, 9am EST).

If you would like to attend the webinar, please email to register.
Please note, spaces are limited so please let us know as soon as possible if you register but are then unable to attend.

We will post further details of upcoming webinars here, so watch this space!

The ChEMBL Team