ChEMBL Resources


Thursday, 28 August 2014

SureChEMBL Update 1

As announced in the previous SureChEMBL blogpost, the temporary holding page is now in place. So when users visit (or, you will be redirected to

For updates on the release of the new SureChEMBL site, please keep an eye on the ChEMBL-og.

Tuesday, 26 August 2014

SureChEMBL Coming Very Soon

In the coming weeks we will be very pleased to announce the release of the new SureChEMBL website. Since the beginning of the year, we have been working hard with the folks over at Digital Science, along with all the content and software providers to get the system setup and running on our own Amazon Web Service controlled environment. As we approach the final stages of the transition, we will need to temporarily halt access to the original SureChem site. The reason for this minor disruption is to allow us to complete the testing of the additional functionality we have added to the SureChEMBL user interface.

We will use ChEMBL-og as the primary route of communicating with users, so if you want to be kept up to date, bookmark the site. We will also make ad hoc tweets about SureChEMBL on @johnpoverington, @georgeisyourman, @surechembl and @chembl.

SureChEMBL User Interface

Users familiar with the previous SureChem UI will find a lot in common with the new SureChEMBL UI. A summary of the changes and new features we have added to the SureChEMBL UI are provided below:
  • A user account is no longer required to access the system
  • All users will have access to ‘Pro’ account features, which include chemistry exports, PDF downloads and enhanced search filters
  • UniChem has been integrated and provides dynamic cross references to external chemical resources
  • The new SCHEMBL identifier is used throughout the interface.
  • Updated compound sketchers (Latest Marvin JS and JSME)
  • Rebranding of headers and footers and removing old SureChem references
We will be keeping an eye on usage of the UI, and don;t know what to expect in terms of new users. We will then review scaling hardware to cope with the load now that the default 'Pro' system is open to all.

SCHEMBL Identifier

In line with ChEMBL IDs, all compounds in SureChEMBL have been given SCHEMBL identifiers. For example, SCHEMBL1353 corresponds to 2-(acetyloxy)benzoic acid, aka aspirin. The identifier can be used to access the SureChEMBL compound page and will be included in all SureChEMBL downloads.

SureChEMBL Data Content

The SureChEMBL pipeline has been running daily throughout the summer and has now processed and extracted an additional ~400,000 novel compounds from patents since SureChem’s pipeline freeze. At the time of writing (16:22 22/08/14), the SureChEMBL counts are:
  • Total number of compounds 15,668,22
  • Total number of annotated patents 12,888,125
The rate of novel compounds and annotated patents is truly staggering: There are approximately 80,000 compounds extracted from 50,000 patents that are added to the system every month. Moreover, the latency for a new patent document from its application date to becoming searchable in the system is only between 2 and 7 days, in most cases.

SureChEMBL and UniChem

The complete SureChEMBL structure repository has been added to UniChem (src_id=15)  and consists of 15.2M unique structures mapped to their SCHEMBL IDs. SureChEMBL updates will be added to UniChem on a weekly basis, so that UniChem will be up to date with novel patent chemistry.

SureChEMBL Data Access

Besides availability in UniChem, the complete SureChEMBL structure repository is provided as SD and tsv file in our ftp site:

It has to be emphasised here that this is the raw compound feed as extracted automatically from text and images and is provided without any further filtering or manual curation. This feed contains fragments, radicals, atoms with wrong valencies, polymers and other oddities but if you are the sort of person who wants to use this raw data, you will know what and how to filter things you don't like out.

The chemical registry rules between SureChEMBL and ChEMBL have not been fully aligned yet - they use fundamentally different toolkits - so there are sometimes multiple SCHEMBL ids for the same InChI - if you know this is an issue, you will know how to fix it for your local purposes if you download the data.

Initially, the SureChEMBL files on the ftp site will be updated on a quarterly basis.

SureChEMBL Future Plans


Going forward we have many plans related to SureChEMBL, some of which are linked to our involvement in the Open PHACTS project. Our current plans include:
  • Extraction of biological entities from the patent literature
  • SureChEMBL API release 
  • Updated workflow tool integration (e.g. KNIME and Pipeline Pilot)
You will hear more about these plans over the coming year, but our top priority now is to deliver the new SureChEMBL user interface.

If you have any questions about the new SureChEMBL system and data please get in touch

Saturday, 16 August 2014

Citing ChEMBL, and Data DOIs

There are now multiple formats and ways to access the ChEMBL data, and we have recently assigned DOIs to all available versions of ChEMBL (and will archive these on the ftp server, permanently).

So when you publish use of ChEMBL, could you reference the following papers:

ChEMBL Database
A. Gaulton, L. Bellis, J. Chambers, M. Davies, A. Hersey, Y. Light, S. McGlinchey, R. Akhtar, A.P. Bento, B. Al-Lazikani, D. Michalovich, & J.P. Overington (2012) ‘ChEMBL: A Large-scale Bioactivity Database For Chemical Biology and Drug Discovery’ Nucleic Acids Res. Database Issue, 40 D1100-1107. DOI:10.1093/nar/gkr777 PMID:21948594

A.P. Bento, A. Gaulton, A. Hersey, L.J. Bellis, J. Chambers, M. Davies, F.A. Krüger, Y. Light, L. Mak, S. McGlinchey, M. Nowotka, G. Papadatos, R. Santos & J.P. Overington (2014) ‘The ChEMBL bioactivity database: an update’ Nucleic Acids Res. Database Issue, 42 1083-1090. DOI:10.1093/nar/gkt103 PMID: 24214965

R. Ochoa, M. Davies, G. Papadatos, F. Atkinson and J.P. Overington (2014) 'myChEMBL: A virtual machine implementation of open data and cheminformatics tools' Bioinformatics. 30 298-300. DOI10.1093/bioinformatics/btt666 PMID: 24262214

S. Jupp, J. Malone, J. Bolleman, M. Brandizi, M. Davies, L. Garcia, A. Gaulton, S. Gehant, C. Laibe, N. Redaschi, S.M Wimalaratne, M. Martin, N. Le Novère, H. Parkinson, E. Birney and A.M Jenkinson (2014) 'The EBI RDF Platform: Linked Open Data for the Life Sciences' Bioinformatics 30 1338-1339 DOI:10.1093/bioinformatics/btt765 PMID:24413672

Also please reference the version of ChEMBL you may have used in any published analyses, using the following DOIs:







Future releases will adhere to the following patterns. We will be modifying the attribution part of the ChEMBL license to require reporting of these DOIs in publications that use ChEMBL. We hope this will contribute to reproducibility of analyses.

Friday, 8 August 2014

Registry numbers in ChEMBL

The numbers are in - the public vote (N=69, so quite small) was overwhelmingly (in roughly a 3:1 ratio) in favour of including registry numbers in ChEMBL/UniChem, as you will see from the screenshot above. There was some discussion (see Google+ and ChEMBL-og comments for details, as well as some Twitter response (it's pretty easy to hunt down if you are really really interested). So we will see what we can do.....

ChEMBL US Tour - an update

We've had a great response to our call for offers of venues to help us on a ChEMBL outreach tour, funded by the project. Things are shaping up pretty well, but we probably still have space for something in the Seattle area, and also space maybe for something in Philadelphia. We also will probably do both East and West coasts on the same trip, due to the very positive response.

Get in touch if you are in the north-west, or north-east!


Monday, 4 August 2014

ChEMBL US Tour 2014

We have some specific funding to do some training and outreach for ChEMBL (including UniChem and SureChEMBL). We would like to set something up on either the East or West coasts - the map above is the Google Analytics view for the blog (remember, we don’t run any analytics on the ChEMBL site, we respect your privacy). Based on this there are a couple of realistic options, and we can realistically only do one of these this year.
  • West Coast - Seattle, Bay Area and San Diego.    or 
  • East Coast - RTP, DC, Philly, New York, New Jersey and Boston.

We are thinking of sometime in November or early December 2014.

In order to make this work, we would need a local coordinator to arrange rooms, advertising to local interested users, and so forth, and also some assistance with logistics planning, and if you have access to special rates at hotels that would be great.

We could run either a lecture/chalk-and-talk set of lectures at each location, or if you have a training room with computers could do some workshops/hands-on training. We would typically cover
  1. Introduction to ChEMBL, UniChem and SureChEMBL
  2. Application of ChEMBL in lead discovery and medicinal chemistry
  3. patent searching in SureChEMBL
  4. Drugs and Targets in ChEMBL
  5. Using KNIME with ChEMBL
  6. Database schema and SQL querying, myChEMBL.

So, any interest in hosting us for a day, and what would you like to hear about? Please mail us!

If there is sufficient interest, we will look into which option (East or West coast) has the most potential meetings.

Once we’ve set something up, we’ll post an itinerary an further details on the ChEMBL-og.

Friday, 1 August 2014

Should CAS numbers be in ChEMBL and/or UniChem?

A very quick survey to add excitement to either your holiday or work-day! None of these sucker links, where there appears a 0.24% complete progress bar on the second page, it's just a simple yes/no question on whether it's a good idea to add CAS registry numbers to ChEMBL and/or UniChem. No promises that we could deliver this, but depending on what you vote for, we will consider our options.

Update: Given the multiple channels out there, there are also comments on this on LinkedIn (in the ChUG - "ChEMBL User Group" group - why not join, if you're not already) and a couple on Google+.

Update 2: I'll let the poll run till the end of the week (Friday 8th 2014) - and then write something up on the results.