ChEMBL Resources


Friday, 29 November 2013

A call for new MMV Malaria Box screening data depositions

Last year, MMV released the MMV Malaria Box, a physical set of 400 probe- and drug-like compounds with confirmed anti-malarial activity. The 'Box' has been since distributed to a large number of academic labs around the world, where the compounds are screened against other plasmodia strains and pathogens such as schistosoma and mTB. The assay results have started coming back in the form of data depositions and, we, as MMV partners, are doing our best to integrate them with both the malaria-data database, as well as the main ChEMBL one. Recent examples of such MMV Malaria Box screening data depositions include:

In addition, we curate and integrate the bioactivity data produced by the excellent Open Source Malaria project.

The value of sharing screening data openly, especially in the field of NTD basic research, could not be emphasised more,as it:
  • minimises the duplication of effort among labs
  • accelerates research outcomes
  • leads to more informed decisions
  • fosters synergies and collaborations among researchers
  • shifts the focus of competition to between ideas as opposed to data access rights
Therefore, we would like to encourage you to deposit your NTD/MMV box screening data, both positive AND negative, regardless of whether they have been published or not, to ChEMBL. We will make sure that your data are appropriately integrated, searchable and downloadable with their provenance visible and properly acknowledged. More importantly, we will make sure your data are open and freely shareable by everyone. 

If you would like to deposit your data here or enquire further, please get in contact.

The ChEMBL Team

Thursday, 21 November 2013

Paper: myChEMBL - a virtual machine implementation of open data and cheminformatic tools

We have just had a paper published in Bioinformatics on myChEMBL - the Linux VM that contains a fully functional version of the ChEMBL database. The paper is here.

myChEMBL is available for download at:

A warning, it is a fairly big download (ca. 18GB, so try and do this over a fast stable connection)

Source code is available here:

%T myChEMBL: A virtual machine implementation of open data andcheminformatics tools
%J Bioinformatics
%D 2013
%O DOI:10.1093/bioinformatics/btt666
%A M. Davies
%A G. Papadatos
%A F. Atkinson
%A J.P. Overington

Job: Chemoinformatician at the Karolinska

Some of our collaborators at the Karolinska have a great job available - the advert is here.

DivisionChemical Biology Consortium Sweden (CBCS) are looking for a highly motivated and talented Cheminformatics Scientist to support and coordinate a wide scope of research informatics applications and data analysis at our Stockholm facilities.

DutiesThe desired candidate will have a demonstrated track record in managing large volumes of scientific data in support of basic research and/or drug discovery projects and should have significant experience with in-house and commercial software solutions that facilitate data capture, analysis and visualization in small molecule research and drug design. Responsibilities include:
  • Evaluation and implementation of a nationally encompassing chemoinformatics system for the SciLifeLab community.
  • Maintainance, configuration, monitoring, and/or troubleshooting scientific applications and underlying software. 
  • Partnering and interaction with software vendors and external professional service staff to resolve issues and implement enhancements.
  • Assist scientists from multiple disciplines with data capture, analysis, visualization.

Wednesday, 20 November 2013

Competition Time - Teach-Discover-Treat 2014

Teach-Discover-Treat (TDT) is excited to announce our 2014 Competition. We have four exciting challenges that focus on developing and disseminating computational workflows for drug discovery of neglected diseases with a premium on reproducibility.

Three cash prizes - plus partial reimbursement of travel - will be awarded! Winners are required to present their work at the TDT Award symposium during the Fall 2014 ACS National Meeting in San Francisco, California.

Create and submit computational workflows that inspire drug discovery activities using freely available software tools. Detailed informationabout the 2014 Competition can be found here:

Submissions deadline is February 3, 2014.

The TDT Steering Committee

Hanneke Jansen, Rommie Amaro, Jane Tseng, Wendy Cornell, Patrick Walters and Emilio Xavier Esposito


Friday, 15 November 2013

New Drug Approvals 2013 - Pt. XIX - Ibrutinib (ImbruvicaTM)

ATC Code:
Wikipedia: Ibrutinib

On November 13, 2013, the FDA approved Ibrutinib (ImbruvicaTM) for the treatment of patients with mantle cell lymphoma (MCL) who have received at least one prior therapy. MCL is a subtype of B-cell lymphoma and accounts for 6% of non-Hodgkin's lymphoma cases. In an open-label, multi-center, single-arm trial of 111 previously treated patients, Ibrutinib showed a 65.8% response rate.

Ibrutinib is an irreversible inhibitor of the Tyrosine-protein kinase BTK (Uniprot:Q06187; ChEMBL:CHEMBL5251; canSAR target synopsis) and is the first approved targeted BTK inhibitor. It forms a covalent bond with a cysteine residue via a Michael acceptor mechanism, in the BTK active site, leading to inhibition of BTK enzymatic activity

Ibrutinib (ChEMBL:CHEMBL1873475; canSAR drug synopsis; also known as CRA-032765 and PCI-32765) has the formula C25H24N6O2 and a molecular weight 440.50. It is absorbed after oral administration with a median Tmax of 1-2 hours. After administration of 560 mg dose, the observed AUC is 953 ± 705 ng⋅h/mL. The apparent volume of distribution at steady state (Vd,ss/F) is approximately 10000 L and the half-life is 4 to 6 hours.

ImbruvicaTM is produced by Pharmacyclics, Inc.

The full Prescribing Information is here

New Drug Approvals 2013 - Pt. XVIII - Obinutuzumab (GazyvaTM)

ATC Code: L01XC15
Wikipedia: Obinutuzumab

On November 1, 2013 the FDA approved obinutuzumab (GazyvaTM) for use in combination with chlorambucil (a nitrogen mustard alkylating agent) for the treatment of patients with previously untreated chronic lymphocytic leukemia (CLL). CLL is the most common type of Leukaemia accounting for 35% of all reported Leukaemias (See CRUK CLL page). In a randomized three-arm clinical study, the combination of obinutuzumab (in combination with chlorambucil) improved the progression-free survival (PFS) of patients to 23.0 months compared to 11.1 months for chlorambucil alone.

Obinutuzumab (CHEMBL1743048) is a humanized anti-CD20 monoclonal antibody of ca. 150 kDa molecular weight. Its target, the B-lymphicyte antigen CD20, is the product of the gene MS4A1 (Uniprot: P11836; ChEMBL: CHEMBL2058; canSAR target synopsis. The CD20 antigen is expressed on the surface of pre B- and mature B-lymphocytes. Obinutuzumab mediates B-cell lysis through three main routes:
  • Engagement of immune effector cells, resulting in antibody-dependent cellular cytotoxicity and antibody-dependent cellular phagocytosis
  • Direct activation of intracellular death signaling pathways
  • Activation of the complement cascade.

The geometric mean (CV%) volume of distribution of obinutuzumab at steady state is approximately 3.8L. The terminal clearance is 0.09 (46%) L/day and the terminal half-life is ~28.4 days.

Obinutuzumab has been issued with a boxed warning because of the following observed events: Reactivation of Hepatitis B Virus (HBV), in some cases resulting in fulminant hepatitis, hepatic failure, and death; and causing Progressive Multifocal Leukoencephalopathy (PML) resulting in death.

GazyvaTM is produced by Genentech, Inc. The full Prescribing Information is here.

Thursday, 14 November 2013

New ChEMBL-NTD Depositions

We are very pleased to announce the release of two new datasets on the ChEMBL-NTD portal.

The first dataset is provided by the Drug for Neglected Diseases initiative (DNDi) and is focused on the selection and optimization of hits from a high-throughput phenotypic screen against Trypanosoma cruzi. The paper describing the dataset in more detail can be accessed here and the data can be downloaded from here.
The second dataset from the DeRisi Lab UCSF and is focused on the screening of MMVs Malaria Box compounds in Plasmodium falciparum, to understand if anti-malarial compounds target the apicoplast organelle. More details about the dataset can be found here and the data can be downloaded from here.  

Both datasets will be loaded into the next version of ChEMBL, which will be due out early next year.

The ChEMBL - Neglected Tropical Disease portal is a repository for Open Access primary screening and medicinal chemistry data directed at neglected diseases. If you would like to deposit your data here please get in contact.

The ChEMBL Team

Tuesday, 12 November 2013

RDKit and Raphael.js

The ChEMBL group had the honour of hosting the second RDKit UGM. It was a great way to catch up with the RDKit community, find out about what they are working and learn about new features the toolkit offers. We gave two talks during the meeting, so if you want to know how Clippy can make interacting with different chemical formats on your desktop easier, go here, and if you want to learn about wrapping RDKit up in a RESTful Web Service a.k.a. Beaker (to be described in future blog post), go here. Many discussions about new features RDKit could offer were had throughout the meeting and one which caught my attention was support for plotting compound images on HTML5 Canvas.

Unable to participate in a hackathon held on the final day, I set about hosting my own small hackathon during the weekend (only 1 attendee). The result of this weekend coding effect was a pull request made against RDKit github repo, introducing the new class called JSONCanvas.

Technical Details

As a general rule of the past, the model for generating image relies on the server to sending a binary representation of the compound (e.g. .png, .jpeg) to the client. With advances in browser technologies, it is now feasible to rely on the client to generate the graphical representation of the compound as it now has access to many methods, which allows it to handle geometrical primitives. It can decide if those primitives should be rendered as SVG, VML or even HTML5 Canvas (check out  Kinetic.js for HTML5 canvas rendering, as it knows how to draw some core shapes on canvas). 

My solution uses Raphael.js - a JavaScript library for drawing vector graphics in the browser. For displaying the graphic is uses SVG on browsers that support this format. On older browsers it will fail over to VML. In the library documentation we can find a very interesting method called Paper.add(). This method accepts JSON containing an array of geometrical objects (such as circle, rectangle, path) to be displayed and returns a handle for manipulating (moving, rotating, scaling) the object as a whole. This means that if we could create a JSON object, which uses shapes to represent a chemical compound, we could draw it or manipulate the compound directly. The new JSONCanvas class produces the previously described JSON object for any* given RDKit compound.

(*I am sure we might find a couple of exceptions)

But why?

1. Cost - reduced server processing required to raster image and often third-party drawing libraries are also required.

2. Bandwidth - reduced bandwidth required to transfer JSON representation of compounds. Also, as it  is text-based you can employ further compression (by configuring your server to send gziped JSON which most modern browsers understand) or using AMF.

3. Accuracy - improved scaling quality made possible with vector graphics.

4. Interactivity - compounds rendered using JSON on the client side can handle standard events such as click, hover, etc. Complex operations (animating, sorting, dragging,...), can also be applied to these objects.


As an example usage of this technique please look at our chemical game. To give you some idea of scale and performance the game loads 1000 compounds when page first loads. If you want to see raw example please explore source of my demo page. Other examples can involve:

1.  Online compound cloud (similar to tag cloud but with compound images instead of words). Such a cloud can be used to visualise compound similarity.

2. Compound stream - substructure search can sometimes return very large number of results. Such results can be represented as pseudo-infinite stream of compounds - only small portion of results is presented on the screen but scrolling down causes more results to be rendered when older one are discarded.

How can I use it?

1. You can download my fork or RDKit containing all relevant changes.

2. Today Greg Landrum, RDKit creator made his own branch containing modified version of the original pull request, so hopefully this is on it's way to be accepted in master branch in future.

As a group we are happy to participate in such a great open source library!


Sunday, 10 November 2013

USAN Watch: September 2013

The USANs for September, 2013 have recently been published. We actually missed September, due to switch over in service for the INNs, but now they're here.

USAN Research Code InChIKey (Parent) Drug Class Therapeutic class Target
aducanumab BIIB-037 n/a monoclonal antibody therapeutic beta amyloid
aptorsen-sodium OGX-427 n/a oligonucleotide therapeutic HSP27
asfotase-alfa ALXN-1215, ENB-0040 n/a enzyme therapeutic n/a
batefenterolbatefenterol-succinate GSK-961081A URWYQGVSPQJGGB-DHUJRADRSA-N synthetic small molecule therapeutic Muscarinic receptors, B2 receptor
bococizumab RN-316, PF-04950615 n/a monoclonal antibody therapeutic PC9
dactolisibdactolisib-tosylate NVP-BEZ235-NX

JOGKUKXHTYWRGZ-UHFFFAOYSA-N synthetic small molecule therapeutic MTOR, PI3K
deldeprevirdeldeprevir-sodium ACH-0142684, ACH-2684 UDMJANYPQWEDFT-ZAWFUYGJSA-N synthetic small molecule therapeutic HCV NS3 PR
etiguanfacine SSP-1871 NWKJFUNUXVXYGE-UHFFFAOYSA-N synthetic small molecule therapeutic
faldaprevirfaldaprevir-sodium BI-201335
synthetic small molecule therapeutic HCV NS3 PR
fedratinib SAR-302503; TG-101348 JOOXLOJCABQBSG-UHFFFAOYSA-N synthetic small molecule therapeutic FLT3, JAK2
grazoprevir n/a n/a synthetic small molecule therapeutic
irinotecan-sucrosofate MM-398, PEP-02 n/a natural product derived small molecule therapeutic topo 1
luspatercept ACE-536 n/a protein therapeutic TGF-B family
mavoglurant AFQ-056 ZFPZEYHRWGMJCV-ZHALLVOQSA-N synthetic small molecule therapeutic mGluR5
otlertuzumab TRU-016 n/a monoclonal antibody therapeutic CD37
ralpancizumab RN317, PF-05335810 n/a monoclonal antibody therapeutic PC9
romyelocel-l CLT-008 n/a cellular therapy therapeutic n/a
roxadustat FG-4592; ASP-1517 YOZBGTLTNGAVFU-UHFFFAOYSA-N synthetic small molecule therapeutic prolyl hydoxylase
simtuzumab AB-0024; GS-6624 n/a monoclonal antibody therapeutic LOXL2
sucroferric-oxyhydroxide PA-21 n/a inorganic sequestering agent n/a
tecemotide BLP-25 n/a peptide vaccine peptide vaccine n/a

Saturday, 9 November 2013

Paper: The ChEMBL bioactivity database: an update

An update to what has happen to the Wellcome Trust funded database ChEMBL over the past few years has just been published - it seems odd, that we've been around long enough to achieve our 2nd NAR Database paper - so much more to do though! This paper contains features and content up to ChEMBL 17.

This could put you in a difficult position which NAR paper to cite in your own publications using ChEMBL; so we suggest both! ;)

Oh, and it's Open Access, of course.

%J Nucleic Acids Research
%D 2013
%P 1–8 
%O doi:10.1093/nar/gkt1031
%T The ChEMBL bioactivity database: an update
%A A.P. Bento
%A A. Gaulton
%A Anne Hersey
%A L.J. Bellis,
%A J. Chambers
%A M. Davies
%A F.A. Krueger
%A Y. Light
%A L. Mak
%A S. McGlinchey
%A M. Nowotka
%A G. Papadatos 
%A R. Santos
%A J.P. Overington


Tuesday, 5 November 2013

Paper: The Functional Therapeutic Chemical Classification System

Here's an Open Access paper from Samuel in the group.

Drug repositioning is the discovery of new indications for compounds that have already been approved and used in a clinical setting. Recently, some computational approaches have been suggested to unveil new opportunities in a systematic fashion, by taking into consideration gene expression signatures or chemical features for instance. We present here a novel method based on knowledge integration using semantic technologies, to capture the functional role of approved chemical compounds.

In order to computationally generate repositioning hypotheses, we used the Web Ontology Language (OWL) to formally define the semantics of over 20,000 terms with axioms to correctly denote various modes of action (MoA). Based on an integration of public data, we have automatically assigned over a thousand of approved drugs into these MoA categories. The resulting new research resource is called the Functional Therapeutic Chemical Classification System (FTC) and was further evaluated against the content of the traditional Anatomical Therapeutic Chemical Classification System (ATC). We illustrate how the new classification can be used to generate drug repurposing hypotheses, using Alzheimers disease as a use-case.

A web application built on the top of the resource is freely available at The source code of the project is available at

%T The Functional Therapeutic Chemical Classification System
%D 2013
%J Bioinformatics
%A S. Croset
%A J.P. Overington
%A D. Rebholz-Schuhmann
%O Open Access

Sunday, 3 November 2013

Magic methyls and magic carpets

A few days ago, there was this post by Derek Lowe, reviewing a recent paper on magic methyls and their occurrence and impact in medicinal chemistry practice. They're called 'magic' because, although methyls are relatively insignificant in terms of size, polarity or lipophilicity, the addition of one in a compound can sometimes have a dramatic impact in its potency - much more that it would be attributed to any simple desolvation effects.

More generally, the 'magic methyl' phenomenon pops up in discussions about the validity of the molecular similarity principle, descriptors, QSAR - almost everything in the applied Chemoinformatics field - and belongs to the general class of 'activity cliffs'. 

Methylation is a chemical transformation, and transformations along with their impact on a property of choice can be easily mined and studied using the so-called Matched Molecular Pairs analysis (MMPA). We already have a comprehensive database of all the matched pairs and transformations in ChEMBL, so it was relatively straightforward to extract all the methylations (H>>CH3) recorded in ChEMBL_17 and analyse their impact in binding affinity. (b.t.w., MMPs are coming to the ChEMBL interface soon, so look out for this feature if you are interested in this area).

So, in more detail, I extracted all the H>>CH3 pairs and joined them with their pActivities (Ki, IC50, EC50) against human proteins as reported in the literature (our data validity flags were quite useful in this case). The trick here is to only consider molecule pairs tested against the same assay, so that their respective activities are directly comparable and one can safely subtract one from the other.

I ended up with 37,771 data points - much more than another recent publication that looked at this. Here's how the histogram of Delta pActivity (log units) looks like:

As you can see, the scale tilts slightly to the left of zero, meaning that methylation has overall neutral to negative effect on binding affinity. This is not the first time people see this. There are indeed, however, several examples (~2.3K out of 37.8K, to be exact) of magic methyls with more than 10-fold increase in activity. More about this later.

Some of you will ask: 'OK, but what about the context? - methylation of a carbon, nitrogen or oxygen is not the same'. You're right, it's not. So I trellised the above plot by a perception of context - i.e. whether the methylation happens next to an aromatic/aliphatic C or N or next to an oxygen:
The same trend, more or less, is observed with the exception of the aromatic carbon context, whereby methylation seems to have more favourable effect that expected by the overall distribution. Perhaps that could be explained by introducing torsional and planarity changes, etc. For a more thorough explanation of this, see here

Here are some examples of 'magic methyls' in the literature:

The take home message is: Magic methyls, unlike magic carpets, do exist but there are also equally as many, or even more, 'nasty' methyls. However, both of them are just a rather small minority compared to the 'boring' methyls - i.e. methyls with minimal or zero impact on potency.

It's just human nature to remember the few exceptions and outliers and forget the vast evidence to the contrary. However, isolating and understanding such edge cases and black swans is what could make the difference in drug discovery. 


New Drug Approvals 2013 - Pt. XVII - Flutemetamol F18 (VizamylTM)

ATC Code: V09AX04

On October 25th, the FDA approved Flutemetamol F18 (Tradename: Vizamyl; Research Code: [18F]AH110690 ), a radioactive diagnostic agent, for intravenous (i.v.) use in Positron Emission Tomography (PET) imaging of the brain in adult patients with cognitive impairment, who are being evaluated for Alzheimer’s disease (AD) and dementia.

Alzheimer's disease is a non-treatable, progressively worsening and fatal disease, characterised by a decrease in cognitive functions, such as memory, and is usually associated with an accumulation of β amyloid (Uniprot: P05067) plaques in several brain regions. These deposits are believed to be responsible for cellular damage and ultimately cell death.

Flutemetamol F18 is the second approved diagnostic drug to estimate β-amyloid neuritic plaque density, after the approval of Florbetapir F18 in 2012. Like Florbetapir F18, Flutemetamol F18 binds to β amyloid plaques in the brain where the F-18 isotope produces a positron signal that can be detected by a PET scanner. The advantages of this compound over its predecessor are: exposure to a lower dose of radiation; and more time for PET image acquisition (20 vs. 10 minutes). In in vitro binding studies using postmortem human brain homogenates containing fibrillar β amyloid, the dissociation constant (Kd) for flutemetamol was 6.7 nM.

It is worth mentioning, that a positive scan, indicating the presence of β amyloid deposits, it's not enough to diagnose a patient with Alzheimer's disease, since these protein deposits can also be present in patients with other types of dementia, or in elderly people without any neurological disease. However, a negative scan, where little or none β-amyloid plaques can be detected, indicates that the cause for dementia is probably not due to Alzheimer's disease.

Flutemetamol F18 (IUPAC Name: 2-[3-fluoranyl-4-(methylamino)phenyl]-1,3-benzothiazol-6-ol; Canonical smiles: CNc1ccc(cc1[18F])c2nc3ccc(O)cc3s2 ; ChEMBL: CHEMBL2042122; PubChem: 15950376; ChemSpider: 13092196; Standard InChI Key: VVECGOCJFKTUAX-HUYCHCPVSA-N) is a synthetic small molecule with a radioactive isotope of fluorine (18F), with a molecular weight of 274.3 Da, 3 hydrogen bond acceptors, 2 hydrogen bond donors, and has an ALogP of 3.61. The compound is therefore fully compliant with the rule of five.

Flutemetamol F18 is available as a radioactive solution for intravenous injection and the recommended imaging dose is 185 megabecquerels (MBq) [5 millicuries(mCi)] in a total volume of 10 mL or less. Following intravenous injection, the plasma concentrations declines by approximately 75% in the first 20 minutes post-injection, and by approximately 90% in the first 180 minutes. Flutemetamol F18 metabolites are primarily excreted via the hepatobiliary (52%) and the renal system (37%).

The license holder for VizamylTM is GE Healthcare, and the full prescribing information can be found here.