ChEMBL Resources


Tuesday, 28 May 2013

PhD positions available at EMBL-EBI

The joy in submitting a PhD is insurmountable, and some say it feels like going to heaven (n.b. Ben!).

Anyway, four EMBL-EBI faculty are recruiting PhD students in this round of the EMBL International PhD Programme. The deadline for submissions is 17th June 2013 - so not long away now - get your skates on if you are interested. These four year studentships will be available from October 2013 onwards.

The labs taking students this round are:

Click on the links above to see current group research interests.

There are also other excellent PhD studentship opportunities available at EMBL in Heidelberg and various of the other EMBL Outstations - for details see here.


Sunday, 26 May 2013

New Drug Approvals 2013 - Pt. VIII - Fluticasone furoate and Vilanterol (Breo ElliptaTM)

ATC Code: R03AK10
Wikipedia: Vilanterol

On May 10th, the FDA approved Vilanterol (Tradename: Breo Ellipta; Research Code: GW-642444M), a long-acting beta2-adrenergic agonist, in combination with the already approved fluticasone furoate, an inhaled corticosteroid, for the long-term maintenance treatment of bronchospasm associated with chronic obstructive pulmonary disease (COPD).

Chronic obstructive pulmonary disease (COPD) is characterised by the occurrence of chronic bronchitis or emphysema, a pair of commonly co-existing diseases of the lungs in which the airways become narrowed. Bronchial spasms, a sudden constriction of the muscles in the walls of the bronchioles, occur frequently in COPD.

Vilanterol is a new long-acting beta2 receptor agonist that through the activation of the beta2 adrenergic receptors present in the bronchial smooth muscle, leads to bronchodilation, and consequently eases the symptoms of COPD.

The beta2 adrenergeic receptor (Uniprot: P07550; ChEMBL: CHEMBL210) belongs to the G-protein coupled receptor (GPCR) type 1 family, and binds the endogenous neurotransmitter adrenaline. Since it is coupled to a Gs protein, its activation leads ultimately to an increase in cyclic AMP (cAMP), which cause relaxation of bronchial smooth muscle and inhibition of release of mediators of immediate hypersensitivity from cells, especially from mast cells.

>ADRB2_HUMAN Beta-2 adrenergic receptor

There are 11 resolved 3D structures for this protein with vary degrees of resolution (2.40 to 3.50 &#197) and different fusion protocols. For instance, 3ny8, is a fused protein of the human beta2 adrenergeic receptor with Lysozyme Bacteriophage T4, with a resolution of 2.84 &#197 and an inverse agonist bound to it (ICI-118,551, ChEMBL: CHEMBL513389):

The full list of PDBe entries can be found here.

The -terol USAN/INN stem covers bronchodilators structurally related with phenethylamine. Members of these class include for example Salmeterol (ChEMBL: CHEMBL1263), Formoterol (ChEMBL: CHEMBL1256786) and Indacaterol (ChEMBL: CHEMBL1095777), all long-acting beta2-adrenergic agonists also approved for the management of COPD. For a full list of compounds check ChEMBL.

Vilanterol (IUPAC Name: 4-[(1R)-2-[6-[2-[(2,6-dichlorophenyl)methoxy]ethoxy]hexylamino]-1-hydroxyethyl]-2-(hydroxymethyl)phenol; Canonical smiles: OCc1cc(ccc1O)[C@@H](O)CNCCCCCCOCCOCc2c(Cl)cccc2Cl; ChEMBL: CHEMBL1198857; PubChem: 10184665; ChemSpider: 8360167; Standard InChI Key: DAFYYTQWSAWIGS-DEOSSOPVSA-N) is a synthetic small molecule, with a molecular weight of 486.4 Da, 6 hydrogen bond acceptors, 4 hydrogen bond donors, and has an ALogP of 4.22. The compound is therefore fully compliant with the rule of five.

Breo Ellipta is available as a dry powder inhaler and the recommended daily dose is one inhalation of fluticasone furoate/vilanterol 100/25 mcg. Following inhalation, vilanterol peak plasma concentrations are reached within 10 minutes, and its absolute bioavailability is 27.3%. At steady state, following intravenous administration, the mean volume of distribution of vilanterol (Vd/F) was 165L in healthy subjects. Vilanterol is strongly bound to human plasma proteins (93.3 %).

Vilanterol is primarily metabolized in the liver by CYP3A4. Therefore, concomitant administration of potent CYP3A4 inhibitors should be avoided. Vilanterol metabolites are primarily excreted in urine (70%) and feces (30%). The effective half-life (t1/2) for Vilanterol is approximately 21 hours in patients with COPD.

Breo Ellipta has been issued with a black box warning due to Vilanterol increased risk of asthma-related death, a known risk to all long-acting beta2-adrenergic agonists.

The license holder for Breo ElliptaTM is GlaxoSmithKline, and the full prescribing information can be found here.

Friday, 24 May 2013

Paper: In silico applications of bioisosterism in contemporary medicinal chemistry practice

If you were looking for an up-to-date review of computational applications of bioisosterism in medicinal chemistry, then look no further. After an overview of the history and evolution of bioisosterism, the paper reports the various attempts aiming to capture and quantify it, as well as to disseminate its examples in the context of modern computer-aided drug discovery.

Link to the paper here

%A G. Papadatos
%A N. Brown
%T In silico applications of bioisosterism in contemporary medicinal chemistry practice
%J WIREs Computational Molecular Science
%D 2013
%O doi.10.1002/wcms.1148


Thursday, 23 May 2013

New Drug Approvals 2013 - Pt. VII - Radium Ra 223 dichloride (Xofigo)

ATC code:
Wikipedia: Xofigo

On May 15th, 2013 the FDA approved the alpha particle-emitting Radium Ra 223 dichloride (Xofigo) as a radiotherapeutic agent for the treatment of patients with castration-resistant prostate cancer, symptomatic bone metastases and no known visceral metastatic disease. The therapeutic component is the alpha particle-emiting Radium 223 isotope. It mimics calcium and binds to bone minerals is areas of rapid cell division, where it preferentially affects cancer cells. The radiation causes high levels of DNA double-strand breaks in adjacent cells, causing the killing of rapidly dividing cells, such as bone metastases.

After intravenous injection, Ra 223 is rapidly cleared from the blood and distributed primarily into bone or is excreted into intestine. The levels of radioactivity detected in the blood rapidly decrease and, at 24 hours, reach less than 1% of the administered dose. The alpha particle emission range of Ra 223 is 100 micrometers which protects against damage to normal surrounding tissue.

The molecular weight of Ra 223 dichloride, 223RaCl2, is 293.9 g/mol. Ra 223 has a half-life of 11.4 days and an activity of 1.9 MBq (51.4 microcurie)/ng.

In Phase 3 clinical trials, Xofigo increased survival to 14.9 months as compared to 11.3 for placebo.

Xofigo is a product of Bayer The Prescribing Information can be found here.

Thursday, 16 May 2013

PKIS data in ChEMBL

The Protein Kinase Inhibitor Set (PKIS) made available by GSK was recently mentioned on In the Pipeline. In collaboration with GSK, we are making the data being generated on these compounds available via the ChEMBL database. We are also creating a portal for the compound set, where the structures can be browsed and downloaded, direct links to the data are provided and useful information can be posted. A preliminary version is available here: feedback would be appreciated.

The data generated on the PKIS set and deposited in ChEMBL may be downloaded in CSV format here (note that the Luciferase dataset described in the recent PLoS paper will be in the next release of ChEMBL). Alternatively, to view the data in the ChEMBL web interface, follow these steps:
  • On the home page, enter 'GSK_PKIS' in the search box and click on the 'Assays' button...

  • On the 'Please select...' menu on the right, choose 'Display Bioactivities'...

  • Again, on the 'Please select...' menu on the right, choose 'Download All Data (TAB)' to download the data as a tab-separated spreadsheet...

To complement these datasets, other data for these compounds held in ChEMBL, such as that extracted from the medicinal-chemistry literature, may be downloaded in CSV format here

For information about obtaining the compound set for screening, please contact Bill Zuercher at GSK.

Wednesday, 15 May 2013

ChEMBL_16 Released

We are pleased to announce the release of ChEMBL_16. This version of the database was prepared on 7th May 2013 and contains:

1,481,473 compound records
1,295,510 compounds (of which 1,292,344 have mol files)
11,420,351 activities
712,836 assays
9,844 targets
50,095 documents
19 activity data sources

You can download the data from the ChEMBL ftpsite and do not forget to read the ChEMBL_16 Release Notes

Data changes since the last release
ChEMBL_16 includes the Millipore Kinase Screening publication (CHEMBL2218924), which is kinase screening panel data set focused on 158 known kinase inhibitors and the OSDD Malaria Screening dataset (CHEMBL2113921), which is a set of anti-malarial compounds and bioactivity data provided by the OSDD Malaria consortium
In addition to the our regular publication and dataset updates we are now also loading supplementary bioactivity datasets. In this example the original paper from GSK was published in 2010 (CHEMBL1157114) and with the release of ChEMBL_16 we now provide 2 supplementary datasets (CHEMBL2218064 and CHEMBL2094195). You can see the original paper an supplemenatry datasets in screenshot below (this also demonstrates the new document search functionality we have added to the interface):

We are would like grow our supplementary bioactivity datasets, so please get in touch if you have any similar data you would like to deposit in the ChEMBL database. Stefan Senger from GSK, has put together the following slides, which provide more details on the pros and pros of depositing  supplementary bioactivity data. (Also thanks Derek Lowe over at In The Pipeline for the following blog post).

Interface changes since the last release:
We have made a number changes to the interface which are listed below:
  • Document Search - Submit a keyword search against journal articles and datasets loaded into the database
  • Browse Targets - We have improved the tree browser on protein classification and organism browser targets page 
  • Browse Drugs - Now allows searching on USAN stem and ATC code definitions
  • Updated FAQ pages - see here
  • Target Report Card - Now contains a target relation section, providing links between targets sharing protein components. The target report card also includes links to CREDO and TIMBAL databases
  • Compound Report Card - Includes a link to NCI Resolver service, to retrieve additional synonyms for a compound
In addition to our regular set of downloads (Oracle, MySQL, PostgreSQL) you will also find RDF version on the ChEMBL database. The current version is 16.0 and the files are available to download here. You can expect some minor changes in the RDF between now and the ChEMBL_17 release and these will be represented by increments in the minor version number
The ChEMBL Team

Monday, 13 May 2013


We would like to draw attention to readers of The ChEMBL-og to an excellent new paper in the Journal of Cheminformatics, describing the work of Egon Willighagen and co-authors in building the first published, publicly available version of ChEMBL data in RDF form (ChEMBL RDF). The paper also provides details on a number of linked data based applications built on top of an RDF data model, demonstrating the benefits of the data transformation. More details about the paper are provided here and the link to the paper is here.

%T The ChEMBL database as linked open data
%A E.L. Willighagen
%A A. Waagmeester
%A O. Spjuth
%A P. Ansell
%A A.J. Williams
%A V. Tkachenko
%A J. Hastings
%A B. Chen
%A D. J Wild
%J J. Cheminf.
%D 2013
%V 5
%O doi:10.1186/1758-2946-5-23

The ChEMBL group have been funded by the IMI OpenPhacts project to build and deploy an RDF version of ChEMBL (which we are currently calling ChEMBL ChEMBL RDF, sorry for the confusion!). With changes in content and curation closely coupled to, and tracking where required, the current relational schema database.

The ChEMBL group's version is available to download and can be picked up from ChEMBL ftpsite here. We have run some project workshops on the ChEMBL ChEMBL RDF, but otherwise have remained a little bit quiet on its existence as we continue to make a number of small changes, hence the minor version number increments, but 15.8 is the latest version and expect 16.0 this week. Just to be clear, the ChEMBL ChEMBL RDF version we are making available is not the same same as the version being described in the Journal of Cheminformatics paper, and the functionality and queries will be non-interoperable.

Our OpenPhacts involvement is not the only reason we have created an RDF version of the ChEMBL database, we have had many requests from the broader global ChEMBL community to provide RDF in our official release process, so going forward this is something we will commit to providing. We are also keen to minimise the impact of changes we make to the ChEMBL relational model, which has evolved significantly since its first release, (ChEMBL data integrators out there will certainly noticed some pretty big changes in our ChEMBL_15 release). So we also commit to keeping the RDF data model in sync with the relational model.


Friday, 10 May 2013

New Drug Approvals 2013 - Pt. VI - Gadoterate Meglumine (DotaremTM)

ATC Code: V08CA02
Wikipedia: Gadoteric Acid

On March 20th 2013, FDA approved Gadoteric Acid (as the meglumine salt; tradename: Dotarem; research code: P 449; CHEMBL: CHEMBL2219415), a gadolinium-based contrast agent (GBCA) indicated for intravenous use with magnetic resonance imaging (MRI) in brain (intracranial), spine and associated tissues of patients ages 2 years and older, to detect and visualize areas with disruption of the blood brain barrier (BBB) and/or abnormal vascularity of the central nervous system (CNS).

When placed in a magnetic field, Gadoteric Acid develops a magnetic moment. This magnetic moment enhances the relaxation rates of water protons in its vicinity, leading to an increase in signal intensity (brightness) of tissues. Gadoteric Acid enhances the contrast in MRI images, by shortening the spin-lattice (T1) and the spin-spin (T2) relaxation times.

Other GBCAs have already been approved by FDA for use in patients undergoing CNS MRI and these include Gadopentetate Dimeglumine (approved in 1988 under the tradename Magnevist; ChEMBL: CHEMBL1200431; PubChem: CID55466; ChemSpider: 396793), Gadoteridol (approved in 1992 under the tradename Prohance; ChEMBL: CHEMBL1200593; PubChem: CID60714; ChemSpider: 54719), Gadodiamide (approved in 1993 under the tradename Omniscan; ChEMBL: CHEMBL1200346; PubChem: CID153921; ChemSpider: 135661), Gadoversetamide (approved in 1999 under the tradename Optimark; ChEMBL: CHEMBL1200457; PubChem: CID444013; ChemSpider: 392041), Gadobenate Dimeglumine (approved in 2004 under the tradename Multihance; ChEMBL: CHEMBL1200571; PubChem: CID49799998; ChemSpider: 25046318) and Gadobutrol (approved in 2011 under the tradename Gadavist; ChEMBL: CHEMBL2218860; PubChem: CID15814656; ChemSpider: 26330337).

Gadoteric Acid is a macrocyclic ionic contrast agent, consisting of the chelating agent DOTA and gadolinium (Gd3+).
IUPAC: gadolinium(3+);2-[4,7,10-tris(carboxymethyl)-1,4,7,10-tetrazacyclododec-1-yl]acetic acid
Canonical Smiles: [Gd+3].OC(=O)CN1CCN(CC(=O)[O-])CCN(CC(=O)[O-])CCN(CC(=O)[O-])CC1
InChI: InChI=1S/C16H28N4O8.Gd/c21-13(22)9-17-1-2-18(10-14(23)24)5-6-20(12-16(27)28)8-7-19(4-3-17)11-15(25)26;/h1-12H2,(H,21,22)(H,23,24)(H,25,26)(H,27,28);/q;+3/p-3

The recommended dose of Gadoteric Acid is 0.2 mL/kg (0.1 mmol/kg) body weight administrated as an intravenous bolus injection at a flow rate of approximately 2 mL/second for adults and 1-2 mL/second for pediatric patients. Gadoteric Acid has a volume of distribution of 179 mL/kg and 211 mL/kg in female and male subjetcs, respectively, roughly equivalent to that of extracellular water, and an elimination half-life of about 1.4 hr and 2.0 hr in female and male subjects, respectively. Gadoteric Acid does not undergo plasma protein binding and it is not known to be metabolized. It is excreted primarily in the urine with 72.9% and 85.4% eliminated within 48 hours in female and male subjects, respectively. In healthy subjects, the renal and total clearance rates are comparable, with a renal clearance of 1.27 mL/min/kg and 1.40 mL/min/kg in female and male subjects, respectively, and a total clearance of 1.74 mL/min/kg and 1.64 mL/min/kg in female and male subjects, respectively.

All GBCAs, including Gadoterate Meglumine, carry a boxed warning about the risk of nephrogenic systemic fibrosis (NSF), a condition associated with the use of GBCAs in certain patients with kidney disease.

The license holder for Gadoterate Meglumine is Guerbet LLC and the prescribing information can be found here (Gadoteric Acid is also approved in Europe and the SPC can be found here).

Thursday, 9 May 2013

Compiling inchi-1 to JavaScript

There are more and more software libraries being ported to JavaScript. The best example is JavaScript/HTML5 Citadel demo of the Unreal Engine. So why not to try with some chemical stuff? One of the most important chemical software libraries is IUPAC InChi. It's also extremely hard to reimplement as it's written in low-level, functional-style C. On the other hand it's just a few headers and source files, without any dependencies so it's a perfect use case for Emscripten.

Emscripten 'is an LLVM-to-JavaScript compiler'. It can be used as a drop-in replacement for standard tools such as gcc or make. Recently it got support for asm.js - optimizable, low-level subset of JavaScript.

I wasn't the first to come up with this idea - one of our local heroesNoel O'Boyle wrote a set of articles about translating the InChI code into JavaScript on his blog. I didn't know about his work during my experiments, which is good, because I took slightly different approach and came up with different results:
  1. I decided to compile inchi-1 binary (by exposing its main function) not the library, because, according to readme file in InChI distribution package, the binary 'does extensively check the input data and does provide diagnostic concerning input structure' so it's the only tool that can be used as an InChi generator with 100% guarantee of having correct results for all input files.
  2. I used '-O2 -s ASM_JS=1' flags to optimize speed.
  3. The resulting JavaScript code (emscripten generated html with embedded JS) weighted 2.8 MB and 732 kB after zip compression (all modern servers and browsers support compressed files). The original inchi-1 binary is about 1.1 MB large so this sounds reasonable.
Of course there are some drawbacks of my approach - the most obvious one is IO. inchi-1 is command line tool expecting a file or plain text as input and printing some text to stdoutand stderr. JavaScript doesn't have any standard input or output. This means that this behavior must be somehow mapped to browser environment. Emscripten maps output to specific textarea element which is reasonable. On the other hand any request for user input is mapped to javascript prompt window. This prompt can accept one line of text at time. Molfiles contain many lines so putting a molfile line by line is tedious.

The solution to this problem would be adding a file input to the webpage and accessing it via Javascript Blob interface. Having the files selected allocating some memory in Emscripten using hints from this SO question and pass it to process_single_input function from inchimain.c file (this should be exported instead of main).

So far I haven't solved the last issue. You can check proof-of-concept here. To use it, open link in your browser, open javascript console (Control-Shift-K on Firefox, Control-Shift-J on Chrome), then type (as two separate commands, pressing Enter after each one):

bla = Module.cwrap('process_single_input', 'string', 'string')
bla('bla -STDIO')

After that, the standard javascript prompt will pop up. You have to copy there your mol file - line by line. If the line should be empty (usually 1st and 3rd lines are) just press enter in the input box. After last line (M END) hit cancel instead of OK. Then select a checkbox suppressing all further popups and press OK. If you entered all mol file lines correctly you will see the result!


Wednesday, 8 May 2013

Direct submissions of data to ChEMBL and the Open PHACTS project

Don’t we just love the fact that these days so much bioactivity data is freely available at no cost (to the end user)? I think we do. The more, the better. So, what would your answer be if someone asked you if you consider it to be a good idea if they would deposit some of their unpublished bioactivity data in ChEMBL? My guess is that you would be all in favour of this idea. 'Go for it', you might even say. On the other hand, if the same person would ask you what you think of the idea to deposit some of ‘your bioactivity data’ in ChEMBL the situation might be completely different.  

First and foremost you might respond that there is no such bioactivity data that you could share. Well let’s see about that later. What other barriers are there? If we cut to the chase then there is one consideration that (at least in my experience) comes up regularly and this is the question:  'What’s in it for me?' Did you ask yourself the same question? If you did and you were thinking about ‘instant gratification’ I haven’t got a lot to offer. Sorry, to disappoint you. However, since when is science about ‘instant gratification’? If we would all start to share the bioactivity data that we can share (and yes, there is data that we can share but don’t) instead of keeping it locked up in our databases or spreadsheets this would make a huge difference to all of us. So far the main and almost exclusive way of sharing bioactivity data is through publications but this is (at least in my view) far too limited. In order to start to change this (at least a little bit) the concept of ChEMBL supplementary bioactivity data has been introduced (as part of the efforts of the Open PHACTS project,

Here is how it works: If you have unpublished bioactivity data that has been generated in an assay that can be found in ChEMBL (since the publication where the assay is described is also in ChEMBL), you can now deposit this data in ChEMBL (see for an example). The obvious situation would be one where only a subset of the results have been reported in the publication but there are many more results (e.g. inactives). If you work in an industrial setting and might feel that you are not be in a position to release additional chemical structures you could think about depositing bioactivity data for compounds in (older) patents. Or you have reported bioactivity data in a poster. These are only examples and there are many more opportunities. In some cases we might explore new territory and the progress might be slow, but if we don’t try new things we are stuck with what we have. 'Do we really want this?' I hope the answer is no. So, let’s not focus on ‘instant gratification’ but help to grow the body of freely available bioactivity data by contributing to ChEMBL supplementary bioactivity data. If we could just give it a go it might make a difference. The concept might be quite restricted (e.g. the assay needs to be published) but we need to start somewhere. If you want to find out more about ChEMBL supplementary bioactivity data why not drop ChEMBL Help a line ( and put ‘ChEMBL supplementary bioactivity data’ in the subject field. And don’t worry, you are not committing yourself by wanting to know more. 

ChEMBL, and the whole world of drug discoverers, is looking forward to hearing from you.  

Stefan Senger

Friday, 3 May 2013

The world of the ChEMBL-og

The map above shows the google analytics map view of the users of the ChEMBL-og. Being the completist that I am, it always bugs me, that there are still quite a few places that don't know about the ChEMBL-og. So if you have some friends in those lonely grey countries, such as Greenland, North Korea, Mali, Niger, Chad, Gabon, Central African Republic, DR Congo, etc. Tell them about the ChEMBL-og!


Thursday, 2 May 2013

PhD Studentship at Babraham - Systems Pharmacology Models of Druggable Targets and Disease Mechanisms

Our friendly neighbours at The Babraham Institute are looking for a PhD candidate to work on systems pharmacology models, as part of a collaboration between the Le Novère (Babraham), the Hermjakob (EMBL-EBI) and the pharmaceutical company GlaxoSmithKline. The Le Novère group uses quantitative computational models to understand cellular and molecular processes, and develop community services that facilitate research in computational systems biology (

One of the major challenges of drug discovery is to demonstrate the efficacy of a potential new drug. This goes beyond the development of a potent molecule - it also implies a good understanding of the biological context, how it relates to a particular disease, and the drug's mechanism of action. The availability of relevant Systems Pharmacology models can therefore have a significant impact. The most comprehensive repository of Systems Biology models in machine readable language is BioModels Database, created by Le Novère and maintained at the EBI. In spite of its extensive collection, BioModels Database only covers a fraction of the Systems Pharmacology models described in the literature. In addition, no analysis has been performed on how they map to druggable targets and/or disease mechanisms. 

The candidate will: 

  1. Use state-of-the-art text-mining methods to extract and analyse the space of Systems Pharmacology models currently described in the literature, with particular emphasis to their relevance to druggable targets and disease mechanisms;
  2. Identify the models offering the best opportunities for the discovery of new drugs, and incorporate them into BioModels Database;
  3. Explore and assess the applicability of those models to real drug development cases, evaluating their quality, advantages, caveats, overlaps, gaps and impact on the demonstration of drug efficacy against specific indications.

The candidate must have an extensive knowledge of molecular biology and pharmacology, and solid basis in numerical analysis and statistics. Advanced familiarity with data representation and programming skills will also be desirable.

  • Thiele I et al. A community-driven global reconstruction of human metabolism. Nat Biotechnol. 2013 Mar 3. Online advance publication.
  • Cucurull-Sanchez L et al. Relevance of systems pharmacology in drug discovery. Drug Discov Today. 2012 17: 665-670
  • Le Novère N et al. BioModels Database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems. Nucleic Acids Res. 2006 34: D689-D691.

For any further information or to express interest, please contact Nicolas Le Novère (n.lenovere (at)