ChEMBL Resources


Monday, 28 October 2013

EU-OPENSCREEN 3rd Stakeholder Meeting, Oslo, Norway

Dear future user, partner, collaborator or supporter!

The ESFRI project EU-OPENSCREEN is an academic infrastructure initiative in Chemical Biology to serve your research needs. We are currently preparing the implementation of this pan-European infrastructure of open screening platforms to support basic and applied research. EU-OPENSCREEN will offer access to a unique compound library representing the know-how of European chemists, to a broad range of cutting-edge screening technologies, to valuable tool compounds for research, and to the knowledge that emerges from validated output of hundreds of screens stored and made publically available in a central database.

We cordially invite you to join us in Oslo for an exciting science day where we inform about the progress of the project and the planned services with reports on the design of the joint European Compound Library, the screening services and the database. In particular, we would like to share with you your own experiences from academic screening projects and thus invite you to present your projects as poster. From these, highlight projects will be selected for oral presentation.

See for more details.

Sunday, 27 October 2013

Competition Time - Win a Raspberry Pi with ChEMBL - chempi

Here's a free to enter competition for a brand new, fully working raspberry pi running the brand new chempi implementation. It includes everything you need to get started at home with ChEMBL - a sort of in silico Breaking Bad maybe (hopefully not, thinking about it). It includes everything you need, with the exception of a power supply and ethernet cable.

We have run out of our creative juices, and cannot think of a suitable poem to mark the release of chempi - so the competition is for you to finish a limerick for us, starting with the line.

There once was a hacker with chempi....

Entries must be posted in the comments section. Obscene or defamatory entries will be removed (all comments are moderated, so it may take a few hours for you entry to appear, so do not repost twenty times!). We haven't really decided how to pronounce chempi (with a hard 'k' start or a soft 'sh' start, just as with ChEMBL, both are used in the wild; and also does it rhyme with scampi, or the irrational number pi?). All entries will be assumed to be made under CC-BY licensing. The competition will be open until noon GMT on Sunday 10th November 2013.

Entires will be judged for compliance to a standard limerick format, outrageous rhymes with chempi, gratuitous chemistry references, and finally humour.

The judges decision (i.e. mine) is final. The winning entry will be published on the ChEMBL-og.


PS Before I get asked, the competition is not open to members of the ChEMBL group, or extended family members of the ChEMBL group.

Tastypie & Chempi

One of the immediate consequences of refactoring our webservices using Django, Tastypie and related approaches (as described here) is that we can run them on almost any database backend. Django abstracts communication with database and using custom QueryManagers we were able to implement chemisty-specific opererations, such as substructure and similarity search in a database agnostic manner.

This means, that if we want, we can use only Open Source components (such as Postgres and RDKit), or elect to use optimised commercially sourced software as appropriate. However, what if we go one step further and try to use Open Hardware as well? This is exactly what we've just done! We managed to install full ChEMBL 17 on raspbery pi.

Some frequently asked questions (at lease those that have been asked internally) and technical details are below:

1. How much space does it take?

12 Gb, including OS, data and all relevant software. Unfortunately we a used 32 Gb SD card so this is size if you would like to use our cloned disk image.

EDIT: Compressed image takes 4.13 Gb.

2. What OS is it running?

Raspbian, free operating system based on Debian.

3. Is it slow?

We haven't make any benchmarks yet. Obviously it's slower than our online web services - but then it's a lot cheaper. On the other hand, performing some sample requests we can say that performance is certainly acceptable; and there is a lot room for improvements - raspberry pis can be easily overclocked from 700 MHz to 1GHz and according to some benchmarks this can give rise to doubling of application speed in some cases. The SD card we used is not the fastest one as well. Finally, all caching is disabled because we wanted to save disk space but using database caching from Django caching framework should give further major improvements - so maybe use the 32 Gb image after all.

Types of request that chempi can be slower on are:

 - Image generation, but if we replace image with JSON from which image can be generated using HTML5 canvas on the client side (the way we generated images in our game) it can be much faster. More about this topic in future blog post.
- Queries using aggregate functions such as COUNT (it seems that we need to optimise our postgres db by adding some more indexes).
- Substructure and similarity search - again, caching, over-clocking and some database and cartridge (choosing faster fingerprints) optimization should solve all the problems. "Premature optimization is a root of all evil", so we first wanted to have a proof of concept that just works, not necessarily works super fast.

4. Can I make my own chempi?

Yes, we are planning to share our SD card image, we will probably use BitTorrent protocol to do this due to image size, and some issues we have faced with distribution of the myChEMBL. We do remember that not everyone has mega-fast broadband!

5. Is chempi useful at all?

Although we think it is interesting as a proof of concept having chemical database on such small and open source hardware, we do think this may have some interesting future real-world applications:

 - plugging our chempi to local network makes it immediately accessible to other computers. So this is a zero configuration demonstration of ChEMBL.
- analogically to the thesis included in this paper, it can encourage cheminformatics education on low cost ARM hardware.
- raspberry can be easily enhanced with camera to perform image recognition. This, combined with software like OSRA can give ability so scan compound images and search them in database.
- adding some e-ink display (for example, jailbroken Kindle?) can produce very interesting small machine...

6. What are some of the technical details?

To deploy our webservices (which are just another Django application) we've used Gunicorn as a server, which in turn connects to NGINX via standard unix pipe. To make it work as a deamon and launch on machine startup, we've used Supervisor. We believe this is ideal way to deploy Django not only on raspberry but on all production machines to if you like to run chembl webservices locally in your company/academia we suggest to do it this way.


Saturday, 26 October 2013

Usan Watch: October 2013

The USANs for October 2013 have recently been published.

We have modified the sourcing of this data - using the new ChEMBL API to automatically parse the documents, extract and validate the mol files for the compounds. So in future, these reports should be more timely, complete and fun!

USAN Research Code InChIKey (Parent) Drug Class Therapeutic class Target
AF-802; CH-5424802

KDGFLJKFZUIJMX-UHFFFAOYSA-N synthetic small molecule therapeutic ALK
GDC-0980.1, G-038390, G-038390.1, RG-7422

YOVVNQKCSKSHKT-HNNXBMFYSA-N synthetic small molecule therapeutic MTOR,PI3K
cimaglermin-alfa GGF2, rhGGF2

n/a protein therapeutic ErbB
VX-509, VRT-831509

ASUGUQWIHMTFJL-QGZVFWFLSA-N synthetic small molecule therapeutic JAK3
A-3309; AZD-7806

GDC-0068; RG-7440

GRZXWCHAXNAUHY-NSISKUIASA-N synthetic small molecule therapeutic AKT

n/a peptide therapeutic GLP1R
MDX-1338, BMS-936564

n/a monoclonal antibody therapeutic CXCR4

Friday, 25 October 2013

New Drug Approvals 2013 - Pt. XVII - Macitentan (Opsumit ®)

ATC Code: C02KX (incomplete)
Wikipedia: Macitentan

On October 13th the FDA approved Macitentan (trade name Opsumit ®) for the treatment of pulmonary arterial hypertension (PAH). Macitentan is an endothelin receptor antagonist (with affinities to both Endothelin ET-A (ETA) and Endothelin ET-B (ETB) receptor subtypes, similar in mechanism of action to the previously licensed drug Bosentan, CHEMBLID957).

The Endothelin receptor ET-A (ETA, CHEMBLID252 ; Uniprot P25101) and Endothelin receptor ET-B (ETB, CHEMBLID1785 ; Uniprot P24530) receptors mediate a number of physiological effects via the natural peptide agonist Endothelin-1 (ET1 , CHEMBL437472 ; Uniprot P05305). In addition to normal roles in supporting homeostasis, these effects can include pathologies such as inflammation, vasoconstriction, fibrosis and hypertrophy.

Macitentan acts as an antagonist for both receptors with both a high affinity and long residence time in human pulmonary arterial smooth muscle cells. Hence it counteracts vasoconstriction and relieves hypertension. One of the metabolites of Macitentan is also pharmacologically active at the ET receptors and is estimated to be about 20% as potent as the parent drug in vitro

Macitentan (CHEMBL2103873 ; Pubchem : 16004692 ) is a small molecule drug with a molecular weight of 588.3 Da, an AlogP of 3.67, 11 rotatable bonds, and 1 rule of 5 violation.

Canonical SMILES : CCCNS(=O)(=O)Nc1ncnc(OCCOc2ncc(Br)cn2)c1c3ccc(Br)cc3
InChi: InChI=1S/C19H20Br2N6O4S/c1-2-7-26-32(28,29)27-17-16(13-3-5-14(20)6-4-13)18(25-12-24-17)30-8-9-31-19-22-10-15(21)11-23-19/h3-6,10-12,26H,2,7-9H2,1H3,(H,24,25,27)

10 mg once daily. Doses higher than 10 mg once daily have not been studied in patients with PAH and are not recommended.

Metabolism and Elimination 
Following oral administration, the apparent elimination half-lives of macitentan and its active metabolite are approximately 16 hours and 48 hours, respectively. Macitentan is metabolized primarily by oxidative depropylation of the sulfamide to form the pharmacologically active metabolite. This reaction is dependent on the cytochrome P450 (CYP) system, mainly CYP3A4 with a minor contribution of CYP2C19. It is interesting to note the presence of bromine atoms in two of the aryl rings, typically a lighter halogen, typically fluorine is used to block oxidative P450-mediated metabolism at these exposed aromatic positions.

At steady state in PAH patients, the systemic exposure to the active metabolite is 3-times the exposure to macitentan and is expected to contribute approximately 40% of the total pharmacologic activity. In a study in healthy subjects with radiolabeled macitentan, approximately 50% of radioactive drug material was eliminated in urine but none was in the form of unchanged drug or the active metabolite. About 24% of the radioactive drug material was recovered from feces.

Macitentan may cause fetal harm when administered to a pregnant woman. Macitentan is contraindicated in females who are pregnant.

Other ERAs have caused elevations of aminotransferases, hepatotoxicity, and liver failure. Obtain liver enzyme tests prior to initiation of Macitentan and repeat during treatment as clinically indicated.

Hemoglobin Decrease 
Decreases in hemoglobin concentration and hematocrit have occurred following administration
of other ERAs and were observed in clinical studies with Macitentan. These decreases occurred
early and stabilized thereafter Initiation of Macitentan is not recommended in patients with severe anemia. Measure hemoglobin prior to initiation of treatment and repeat during treatment as clinically indicated.

Strong CYP3A4 Inducers / Inhibitors
Strong inducers of CYP3A4 such as rifampin significantly reduce macitentan exposure whereas concomitant use of strong CYP3A4 inhibitors like ketoconazole approximately double macitentan exposure. Many HIV drugs like ritonavir (CHEMBL163) are strong inhibitors of CYP3A4.

The license holder is Actelion Pharmaceuticals US the full prescribing information can be found here.

Tuesday, 22 October 2013

ChEMBL Web Service Update 3: Image Rendering Changes

If you are a follower of this blog you will have seen some earlier posts (here and here) providing details on changes we are making to our Web Services. I recommend reviewing the previous posts, but in summary we have setup a temporary base URL to allow existing ChEMBL Web Service users to test the new ChEMBL API powered Web Services. The new temporary base URL is:

As well as providing users with all existing functionality we have also added a couple of extra features, one of which is improved molecule rendering options. The current live Web Services provides the following REST call to allow you to get a molecule image: 


You are able to provide a dimension argument (pixels) to change the size of the image:

The image quality has deteriorated, this is because the image returned is simply re-sized version of the first image. The new ChEMBL API powered Web Services addresses this issue by dynamically generating the images, using either the RDKit or the Indgio chemistry toolkits (defaults to RDKit). So, to get an image using the new services, you just need to add '2' to the base URL:

When using the dimensions argument with the new Web Services you now get the following improved smaller image:

The coordinates used to generate the image are based on those found in the ChEMBL192 molfile. All current ChEMBL images are produced using Pipeline Pilot, which is currently setup to ignore the molfile coordinates and layout molecule how it sees best. This explains why the layout of the first two images are different to the second two. We can get the new Web Services to ignore coordinates and get the chemical toolkit to layout molecule coordinates how it sees best using the ignoreCoords=1 argument:

If you would prefer to use Indigo to generate your ChEMBL molecule images you can use the engine argument:

Finally, it is also possible to use any combination of the 3 arguments mentioned above:

In summary, the new Web Service base URL extends the the current image generating functionality, by improving the dimensions argument and introducing the ignoreCoords and engine arguments. More details in table below:

Argument Name Argument Description Argument Options Default
dimensions Size of image in pixels 1-500 500
ignoreCoords Choose to use or ignore coordinates in ChEMBL molfiles 1 or 0 0 (Use ChEMBL molfile coordinates)
engine Chemical toolkit used to generate image RDKit or indigo RDKit

We hope you find these image  rendering changes useful and if you have any questions please let us know via mail to "chembl-help at" if you have any questions.

The ChEMBL Team

Friday, 18 October 2013

ChEMBL KNIME training?


We recently did some KNIME training for ChEMBL at a workshop, and it was very popular. It made us think a little about just how much was available within Knime for ChEMBL, and we thought we'd ask if there was interest in us running a specific, detailed course on ChEMBL/KNIME next year.

So here's a poll. We'll keep this open for a month (i.e. closes 18th November 2013) and then decide what to do (if anything).

The stoopid free poll server I sued doesn't like the browser safari - so I'll transfer across to another system over the weekend, and try and transfer votes. Thank you to those that have voted so far.

Would you be interested in Knime ChEMBL training?
Yes - I'd like a two day course at the EBI next year
Yes - I'd like webinars
Yes - I'd like you to visit our lab (charge involved)
Yes - but not from you guys.
No - Knime, what's that
free poll

Thursday, 17 October 2013

New Bot on the Blog

Following the success of our ChEMBL Bot, there is now a new faithful bot out there which answers to the name @MalariaSARLit and is looking for new followers. Its job is to tenaciously monitor PubMed for new malaria-related publications, score them according to our ChEMBL-likeness score and tweet a ChEMBL-like one daily at noon GMT. Followers of the bot will get a free and reliable antimalarial SAR paper alert every day in their twitter feed.  

George (NKOTB fan)

Sunday, 13 October 2013

New Drug Approvals 2013 - Pt. XVI - Riociguat (AdempasTM)

ATC code: not yet assigned
Wikipedia: Riociguat

On October 8, 2013, the FDA approved riociguat for the treatment of patients suffering from two forms of pulmonary hypertension - chronic thromboembolic pulmonary hypertension (CTEPH), and pulmonary arterial hypertension (PAH).

Pulmonary hypertension (PH) is a disease characterized by abnormally high blood pressure in the lungs, which increases the workload for the right ventricle of the heart. Some of the symptoms of PH are dizziness, shortness of breath and water deposits in the legs and joints. PH progresses slowly and can lead to severe and often fatal circulatory and respiratory complications. CTEPH is a form of PH caused by blood clots obstructing the passage of blood through the vessels in the lung, often after a pulmonary embolism has occurred. PAH on the other hand is caused by a chronic tightening or constriction of blood vessels.

Riociguat (CHEMBL2107834) is a stimulator of soluble guanylate cyclase (sGC), an ezyme that is activated by increased levels of nitric oxide (NO). Downstream signalling of increased levels of cGMP (CHEBI:28181) causes the dilation of the endothelium in blood vessels. SGc is a heterodimer consisting of an alpha- and beta-subunit. There are two known isoforms for each subunit (Uniprot-ids, alpha: P33402, Q02108 ; beta: Q02153, O75343). Stimulation of the kinase by riociguat and other sGC stimulators depends on the presence of a reduced heme group in the sGC beta-subunit. The activation of sGC by this class of compounds is synergistic with NO signalling. Some other compounds in this class are YC-1 (CHEMBL333985) and BAY 41-8543 (CHEMBL1916024). In contrast, the sGC can also be targeted through activators that work independently of NO signalling. 

Canonical SMILES: COC(=O)N(C)c1c(N)nc(nc1N)c2nn(Cc3ccccc3F)c4ncccc24
Std-InChI:  InChI=1S/C20H19FN8O2/c1-28(20(30)31-2)15-16(22)25-18(26-17(15)23)14-12-7-5-9-24-19(12)29(27-14)10-11-6-3-4-8-13(11)21/h3-9H,10H2,1-2H3,(H4,22,23,25,26)

Riociguat has a molecular weight of 422.42 Da. The calculated LogP for riociguat is 2.34 and the compound has no stereo-centers.

The compound is administered orally and was approved through the FDA priorities review program. It has a black box warning because it can harm fetuses and is therefore not prescribed to pregnant women. Other adverse effects of riociguat include headache, dizziness, indigestion, peripheral edema, nausea, diarrhea and vomiting.

Riociguat is a first-in-class compound and was developed by Bayer HealthCare Pharmaceuticals.

Riociguat will be marketed as a prescription medicine under the name Adempas.

Wednesday, 9 October 2013


Yesterday saw the release of the EMBL-EBI RDF Platform, the official announcement can be found here. The purpose of this new platform is to act as a central resource for all RDF and Semantic Technology focused work being carried out at the EMBL-EBI. The benefit to users of the RDF version of the ChEMBL database is that you now have access to documentation, a SPARQL endpoint, example SPARQL queries and a Linked Data browser.

Other EMBL-EBI resources involved in this project include BioModels, BioSamples, Expression Atlas, Reactome and UniProt - we expect the number of resources offering RDF versions of their data to grow over the coming year.

One of the very cool things the new platform offers users is the ability to run federated SPARQL queries across the separate resources listed above. Essentially this is removing the data integration burden, which would have previously been required in order to answer the questions asked by the federated queries. Example federated SPARQL queries include:
 We hope you find the new resource useful and please use this page to provide feedback

New Drug Approvals 2013 - Pt. XV - Vortioxetine Hydrobromide (BrintellixTM)

ATC Code: N06AX26
Wikipedia: Vortioxetine

On September 30th 2013, FDA approved Vortioxetine (as the hydrobromide salt; tradename: Britellix; research code: Lu AA21004 (Lu AA21004 (HBR) for the hydrobromide salt); ChEMBL: CHEMBL2104993), a multimodal antidepressant indicated for the treatment of major depressive disorder (MDD).

MDD is a mental disorder characterised by low mood and/or loss of pleasure in most activities, and by symptoms or signs such as increased fatigue, change in appetite or weight, insomnia or excessive sleeping and suicide attempts or thoughts of suicide. MDD is believed to arise from low levels of neurotransmitters (primarily serotonin (5-HT), norepinepherine (NE) and dopamine(DA)) in the synaptic cleft between neurons in the brain. Several antidepressants for the treatment of MDD are already available in the market and its choice depends on which symptoms need to be tackled. The most important classes of antidepressants include the Selective Serotonin Reuptake Inhibitors (SSRIs) such as Fluoxetine (ChEMBL: CHEMBL41), Sertraline (ChEMBL: CHEMBL809), Paroxetine (ChEMBL: CHEMBL490), Fluvoxamine (ChEMBL: CHEMBL814) and Escitalopram (ChEMBL: CHEMBL1508), which are believed to maintain the levels of 5-HT high in the synapse; and the Serotonin-Norepinephrine Reuptake Inhibitors (SNRIs) such as Venlafaxine (ChEMBL: CHEMBL637), Duloxetine (ChEMBL: CHEMBL1175), Desvenlafaxine (ChEMBL: CHEMBL1118) and Milnacipran (ChEMBL: CHEMBL259209), which in turn are thought to maintain higher levels of 5-HT and NE in the synapse. Vortioxetine is a novel multimodal serotonergic compound, which displays antagonistic properties at serotonin receptors 5-HT3A (ChEMBL: CHEMBL1899; Ki=3.7nM) and 5-HT7 (ChEMBL: CHEMBL3155; Ki=19nM), partial agonist properties at 5-HT1B receptors (ChEMBL: CHEMBL1898; Ki=33nM), agonistic properties at 5-HT1A receptors (ChEMBL: CHEMBL214; Ki=15nM) and potent inhibition at the serotonin transporter (SERT) (ChEMBL: CHEMBL228; Ki=1.6nM). The contribution of these activities to the antidepressant action of Vortioxetine is not fully understood, however Vortioxetine is believed to be the first compound with this combination of pharmacodynamic activity.

Vortioxetine is a synthetic small molecule with a molecular weight of 298.5 g.mol-1 (379.4 g.mol-1 for the hydrobromide salt), an ALogP of 4.5, 3 hydrogen bond acceptors, 1 hydrogen bond donor, and therefore fully compliant with Lipinski's rule of five.
IUPAC: 1-[2-(2,4-Dimethyl-phenylsulfanyl)-phenyl]-piperazine, hydrobromide
Canonical Smiles: Cc1ccc(Sc2ccccc2N3CCNCC3)c(C)c1
InCHI: InChI=1S/C18H22N2S/c1-14-7-8-17(15(2)13-14)21-18-6-4-3-5-16(18)20-11-9-19-10-12-20/h3-8,13,19H,9-12H2,1-2H3

The recommended starting dose of Vortioxetine is 10 mg administrated orally once daily. The dose should then be increased to 20 mg/day, as tolerated. For patients who do not tolerate higher doses, a dose of 5 mg/day should be considered. Vortioxetine is 75% orally bioavailable, with an apparent volume of distribution of 2600L, a plasma protein binding of 98% and a terminal half-life of ca. 66 hours. Vortioxetine is extensively metabolised primarily through oxidation via cytrochrome P450 enzymes CYP2D6, CYP3A4/5, CYP2C19, CYP2C9, CYP2A6, CYP2C8 and CYP2B6 and subsequent glucuronic acid conjugation. CYP2D6 is the primary enzyme catalysing Vortioxetine to its major, pharmacologically inactive, carboxylic acid metabolite. Poor metabolisers of CYP2D6 have approximately twice the Vortioxetine plasma concentration of extensive metabolisers and therefore the maximum recommended dose in known CYP2D6 poor metabolisers is 10 mg/day. Vortioxetine is excreted in the urine (59%) and feces (26%) as metabolites, with a negligible amount of unchanged compound being excreted in the urine up to 48 hours.

The licensed holder of Vortioxetine is H. Lundbeck A/S and the full prescribing information can be found here.

Tuesday, 8 October 2013

Paper: Target Prediction for an Open Access Set of Compounds Active against Mycobacterium tuberculosis

Here's a paper detailing some multi-method target prediction work as part of the GeMoA FP7 project. Proud, as ever, to publish Open Access.

%A Martínez-Jiménez F
%A Papadatos G
%A Yang L
%A Wallace IM
%A Kumar V
%A Pieper U
%A Sali A
%A Brown JR
%A Overington JP
%A Marti-Renom MA
%D 2013 
%T Target Prediction for an Open Access Set of Compounds Active against Mycobacterium tuberculosis
%J PLoS. Comput. Biol. 
%V 9
%P e1003253
%O doi:10.1371/journal.pcbi.1003253

Friday, 4 October 2013

ChEMBL Web Service Update 2: JSONP Support

We posted earlier in the week about some behind the scenes changes we had made to our Web Services. Having read that post (if you missed the post and use our Web Services please take a look), you will know we setup a temporary base URL to allow users to test the new ChEMBL API powered services. The base URL is:

We have made it straightforward for users to test the new services as all current methods are available using the new base URL. As well as maintaining existing functionality, we have also been able to add a couple of new features, the first of which is JSONP support. Those familiar with web application development will be familiar with the issue of requesting data from a domain different from that of the domain the web application is running. This type of data requested is prevented by the web browser, due to the enforcement of the same-origin policy. This is an important security concept, but there are times when it being able pull data in from a trusted source enhances the functionality of the web application and makes the life of the developer much easier. Adding JSONP support to the ChEMBL Web Services allows users to now pull ChEMBL data into their web pages with minimal effort. So how do you add JSONP support? Simple, you add an extra argument to the to Web Service call which provides the name of a callback function, which is then used to wrap the regular JSON response.

Currently you can request a JSON response with the following URL:

To create a JSONP response you add the callback argument parameter (Note, you do not need to include .json and the callback argument can be any value):

We hope you find this useful and if you have any questions get in touch.

The ChEMBL Team

Thursday, 3 October 2013

ChEMBL Virtual Machine (a.k.a. myChEMBL)

With last weeks ChEMBL_17 release out of the way, we have had time to revisit our ChEMBL Virtual Machine project. This project, which we now refer to as myChEMBL, is aimed at providing users with a complete and free, easy-to-intstall cheminformatics infrastructure. To achieve this we have provided users with a Ubuntu based virtual machine, which comes with:
  • A PostgreSQL database, preloaded with ChEMBL_17. You will notice some extra tables, which are required to allow the RDkit chemical searching.
  • The latest build of RDkit taken from the RDkit github repo.
  • A web application to allow you to query the ChEMBL database. You can pick up a copy of the web application from Rodrigo Ochoa github repo.
You might now ask How do I get a copy of myChEMBL? And the answer to that is you visit the ChEMBL ftpsite:

To install myChEMBL you will need to use some virtualisation software, such as VirtualBox or VMware Fusion. You will find installation instructions on the ftpsite, which describe how to load myChEMBL into VirtualBox. These will soon appear on this blog - with pictures :)

Get in touch via mail to if you have any myChEMBL questions.

The ChEMBL Team

Compound Curation - The story so far...

As chemical curator for ChEMBL, I spend a lot of time processing, checking and standardising the compounds in the database. I use various pieces of software for this, but mostly it’s Pipeline Pilot. For those of you who don’t know, Pipeline Pilot is Accelrys’s graphical scientific workflow authoring application, that allows passing hundreds of thousands of compounds through various components to make sure they meet our standards to be loaded into ChEMBL.

However, it’s always incredibly useful to utilise other available software in a complementary manner to see if anything may have been missed, could be done in a different way or just to see what alternative results you can get. One such open source software package is Indigo, created by GGA Software Services. On of the web application developers was passing all of the ChEMBL compounds through the standard Indigo loader, via a Python script, during the course of his work, and found that there were about 9,000 compounds (0.7% of the current database) that failed to be loaded. The list of exceptions was then examined to see where the errors had come from. An important learning here is that different tool kits will throw different exceptions, since these structures were all happy within the PP environment.

The reasons they had failed were as follows:

1. The presence of a wiggly (query) bond
2. Two stereo bonds connected to one chiral centre
This was split into two sections:
Firstly, where the two bonds effectively canceled each other out and no stereochemistry was recorded at that centre.
Secondly, where the stereochemistry was present at that centre but having the two stereo bonds is against IUPAC drawing standards.
3. Presence of a stereo bond when there’s no chiral centre

Some examples of typical scenarios are shown below:

From this Indigo check, I was able to extract and fix these compounds, a lot of which won’t have new standard InChIs, just updated molfiles (i.e. they will keep their CHEMBL ID). For most of these compounds, to confirm the changes that I was going to make, I went back to the original published literature. It is interesting to note that the majority of compounds with the two stereo bonds on a single chiral centre had been extracted exactly as they had been drawn in the paper. 

These changes will be visible in ChEMBL_18 and I am aiming to incorporate this Indigo loader into our standard compound cleanup and loading protocol. This will probably be implemented under the Indigo toolkit extension that is found in Knime.

Any questions or queries about what I have done, please feel free to email:


Faculty positions for Bioinformatics and Computational Biology - The Crick Institute, London

An outstanding opportunity for computational research using approaches such as bioinformatics, genomics, systems biology, mathematical modelling, image analysis.

The Francis Crick Institute
The Francis Crick Institute will open at St Pancras in central London in 2015. Its research will use interdisciplinary approaches to investigate the biology of human health and disease, supported by core funding from CRUK, the MRC, and the Wellcome Trust, and by grants from UK and international funding agencies.

The Crick is expanding Computational Biology research as a key component of its scientific strategy. The institute will offer an outstanding environment for computational research, with excellent opportunities for wet/dry collaborations across the range of biomedical and clinical research disciplines, supported by a strategic alliance with the Wellcome Trust Sanger Institute. The new Crick laboratories will feature excellent computational facilities including a state-of-the-art data centre.

The London Research Institute
The London Research Institute (LRI) is the largest Cancer Research UK research institute, with 40 research groups focusing on fundamental cancer biology. The Institute is based in well-equipped laboratories at Lincoln's Inn Fields in central London, and at Clare Hall in Hertfordshire.

Computational Biology in Cancer
An outstanding opportunity for computational research using approaches such as bioinformatics, genomics, systems biology, mathematical modelling, image analysis. The LRI recruitment process for 2013 will carried out jointly with the Crick Institute. We shall appoint outstanding scientists seeking to establish independent and innovative research programmes focussed on: 

Newly appointed group leaders will receive core funding for research personnel, travel and consumables, and access to the Institute's comprehensive computational core facilities, backed by competitive employment terms. The new group leaders will move to the Crick laboratories in 2015.

Application deadline: 22 November 2013

Wednesday, 2 October 2013

ChEMBL Web Service Update 1

Over the last the year we have be doing a lot work designing and building an API layer to the ChEMBL database. The reason for adding this programmatic interface is to simplify many of the daily tasks we carry out on the database. From a technical perspective the API is actually a series Object Relational Mapper (ORM) classes built on top of the ChEMBL database using the Python Django web framework. For many of our daily programmatic tasks we use the ORM directly, but we also expose the ORM as a RESTful Interface using Tastypie.

Some examples tools and processes currently using the new API include the ChEMBL twitter bot and the database migration process (creating PostgreSQL and MySQL versions of the ChEMBL Oracle database during the ChEMBL release cycle). We are now at the stage where we can start to think about updating some of the existing larger services to run off the new API and first of these to make the transition are the ChEMBL Web Services. So, what have we done? Essentially we have rewritten the Web Services using the API (actually we use the ORM in this case) to interact with the ChEMBL data model. We have made this new set of Web Services available under the following base URL:

Those familiar with our current Web Services will notice we have added a ‘2’, to the end. An example call the current live service looks like:

and the same call to the new Web Services looks like:

To refresh yourself on all methods we currently make available please visit the Web Service Documentation page

The new Web Service base URL will provide you with all the same methods listed on page above and more importantly the format of the results returned by the Web Services will also be the same. Our plan going forward is to run both services for next 4-6 weeks and we ask users of the current ChEMBL Web Services to test the new versions (remember you just need to add a 2) and report back any issues encountered. Assuming we do not hit any major obstacles, after the 4-6 week period we will replace the current live services with the new ChEMBL API based services.

This first Web Service update is technology focused. We want to ensure the new services scale and perform well in the wild and that our end users do not notice a change (well we are hopefully expecting you to see a performance boost). Further down the line we will make some bigger changes to the Web Services, such as reviewing methods, attributes, naming conventions, introduce paging and more. We will obviously consult the community and allow for a period of transition before releasing any such changes. Now is the time to tell us if you have any must have new ws features.

Finally, it is not strictly true that the new Web Services are identical to the current live versions. There are a couple of new features we have built in, such as improved image rendering and JSONP responses. We will blog about these in new features in the next couple of days, but in the meantime please have a look at new ChEMBL Web Services and let us know how you get on.

The ChEMBL Team