ChEMBL Resources


Tuesday, 23 January 2018

Targets in ChEMBL through the years

Evolution of targets over time

While ChEMBL was first released in 2009, the data on which it is built originate from publications extending back to 1975. Despite relatively sparse coverage from the early years in comparison to now, it is interesting to see how the publically available data for targets has grown over time. This interactive plot aims to present key data for each of ChEMBL’s targets over the years, in a style inspired by the late Hans Rosling’s TED talk on global development (if you haven't already seen it, I recommend that you watch it now!)

As shown above, dragging the slider at the bottom of the plot updates the year to reflect the data available up until that point.  The following values are shown:

  • Y-axis: The cumulative sum of compounds with a pChEMBL value for the target
  • X-axis: The maximum pChEMBL value or LLE (depending on radio button selection) achieved to date for target
  • Point Size: The maximum phase achieved for the target
  • Colour: Protein classification

Hovering the mouse over a datapoint will reveal the target's name, however as the number of points increases, it may become difficult to make sense of the data. In addition to the controls at the top of the plot, which allow you to zoom and pan the data, it is possible to filter the data by protein classification. For example, single clicking on "Enzyme" will toggle these points on and off, double clicking will turn all other points, allowing you to isolate the data for enzymes.

Use the plot to explore the target data in ChEMBL, feel free to share any interesting observations in the comments.

The plot was created using Dash and Plotly. You can view a larger version of the plot here, or download the source code here.

Thursday, 11 January 2018

Software Engineer Wanted!

We are currently seeking a talented Software Engineer to work on developing our exciting SureChEMBL resource.

SureChEMBL is a publicly available large-scale database containing compounds extracted from the full text, images and attachments of patent documents. The data are extracted from the patent literature according to an automated text and image-mining pipeline on a daily basis (see for more information), producing a database of more than 19 million chemical structures.

The successful candidate will have a minimum of 3 years of professional development experience with strong core Java Enterprise Edition development skills (please see job description below for full requirements).

For more details of the position, or to apply please visit:

The closing date for applications is 21st January 2018