ChEMBL Resources


Tuesday, 31 December 2013

New Year, New Job? Research Associate in Epidemiology at UCL (ChEMBL related)

As part of a collaboration the ChEMBL group are involved in with UCL, we are looking to appoint to a full time, three-year position in the Genetic Epidemiology Group, Institute of Cardiovascular Science, one of the component institutes of the UCL Faculty of Population Health Sciences.

The appointee will join an exciting programme of work funded by the UCL National Institute of Heath Research Biomedical Research Centre, through its High Impact Award scheme. The appointee will apply bioinformatic expertise to the late stage development of a new high density genotyping array designed to support drug target validation and related drug development issues; co-ordinate deployment of the array in a large consortium of highly-phenotyped cohort studies (the University College-London School of Hygiene-Edinburgh-Bristol consortium); undertake statistical analysis of the data; and play a leading role in writing manuscripts reporting findings arising from this work.

The post is based at UCL, but the work will involve collaboration with colleagues in Edinburgh, Bristol, the London School of Hygiene and Tropical Medicine, and the European Bioinformatics Institute at the Wellcome Trust Genome Campus in Hinxton, Cambridgeshire.

Key Requirements - The post would suit someone whose background is in genetic epidemiology, statistical genetics, or bioinformatics, particularly those whose expertise extends across two or more of these areas. We expect the appointee to have a PhD or equivalent degree. We are willing to consider applications from exceptional individuals who have only recently obtained their PhD degree, provided they can demonstrate the requisite skill. We place a strong emphasis on potential for independence and on supporting career development.

We particularly welcome female applicants and those from an ethnic minority, as they are under-represented within UCL at this level.
More details and how to apply are here.

Closing Date - 12 Jan 2014


Friday, 20 December 2013

Conference: CEADD2014 - Modelling water in biological systems, London, March 2014

A one day conference entitled 'Modelling Water in Biological Systems' will be held at the School of Oriental and African Studies (SOAS) in London on Friday, 28 March, 2014. This meeting, organised by the MGMS, is the latest in the 'Cutting Edge Approaches to Drug Design (CEADD)' series.

In recent years, significant progress has been made in probing the role of water molecules in protein-ligand binding.  Hydration is a crucial factor in understanding binding modes, ligand affinities and kinetics.  Modelling tools are becoming available which may offer new insights in this exciting and evolving area of current research. This conference provides a timely overview of some of the main research avenues in this important field.

Thursday, 19 December 2013

ChEMBL Web Service Update 4: A Reminder

This post is to remind users of the ChEMBL Web Services that we will soon be changing the backend to use the new ChEMBL API. Since our initial announcement about the changes, which you can read about here, here and here, we have made some more changes and optimisations, which speed up the services significantly.

We thank everyone for feedback to date and urge anyone else who makes use of the ChEMBL Web Services to test the new version. Remember they are simple to test, just use the following temporary base URL and everything should work as if you are using the current live Web Services:

We would like to make the change in January, so please get in touch if you have any questions or experience any problems.

Once we have made the technology switch and happy that it is working in the wild as expected, we will be doing a complete review of the functionality offered by the current ChEMBL Web Services. So expected some big changes in 2014.

Tuesday, 17 December 2013

UniChem: A resource for compound mapping - use in BioMedBridges

Unichem is a simple database and web service for the InChI-based linkage of chemical structures across various resources. It was initially developed under the EU-OPENSCREEN ESFRI as an approach to link screening data from the planned screening collection to other chemistry resources. The development was then extended under the BioMedBridges project - which spans across various Biomedical Sciences (BMS) ESFRIs (such as ELIXIR, BBMRI, etc.)

It's proved to be remarkably useful to us as well, and will be the future home of regularly updated feeds of compound structures from SureChEMBL - and will allow rapid novelty checking of patent structure novelty, across component datasources. A side-effect of this, is of course, that immediately the compounds in any of the BioMedBridge partner ESFRIs immediately have patent data integration. For us, this synergy, and snowball effect of binding resources together using simple open standards is one of the great joys of our work!

Follow @SureChEMBL for ongoing updates on status of the SureChEMBL resource.

Friday, 13 December 2013

Notes from Rita's Talk Yesterday.

Rita gave a talk on her recent drug target work yesterday on campus, and Jenny Cham took notes; aren't they great?


Wednesday, 11 December 2013

SureChEMBL - Chemical Structure Information in Patents

Today we have announced that we are taking over the running of the SureChem system from Digital Science. We have renamed this SureChEMBL to reflect the history and provenance of the technology and engineering, but also to align it with it's new home and future, we like the name, and hope you do. We are delighted that this has happened - Nicko and the team at Digital Science have been great, and the more we have dug in to how it works, the more we have appreciated the design and vision that they had.

If there is one consistent piece of feedback we get about ChEMBL it is in encouraging us to add patent data to what we do. So now we have, but because the data from patents is different in detail from that reported in the published literature, we will keep the databases separate, but closely integrated.

For those of you that are already SureChem users you will be familiar with the functionality and how it works; but for those that weren't SureChEMBL takes feeds of full text patents, identifies chemical objects from either the in-line text or from images and adds 2-D chemical structures. This is then loaded into a database and is searchable by chemical structure, so you can do substructure, similarity searching and so forth - all the good things you'd expect from a chemical database. This chemical search functionality is unavailable from the public, published patent documents, and is really essential for anyone seriously using the patent literature. Oh, and the system does this live, so as patents are published, they are processed and added to the system - the delay between publication and structures being available in SureChEMBL is about a day when converted from text, and a few days when converted from image sources.

SureChEMBL is hosted on the cloud - it's quite a complicated AWS solution, and it will take a few months for us to assume complete control of all the various parts, and, importantly keep things running smoothly behind the scenes, so the continuous access to fresh patent data is maintained.

SureChEMBL uses a number of third part software products in its operation, and arranging the licenses and permissions has been complex, and is still ongoing. The 3rd party software and data feeds used in SureChEMBL include:

Name to structureChemAxon, ACD/Labs, Perkin Elmer, OpenEye, OPSIN, NextMove
Chemical cartridge: ChemAxon
Image to structureKey Module
Patent data: FairView (IFI Claims) – processed patents, TwinDolphin – patent PDFs

These guys have all been a pleasure to work with so far, and SureChEMBL is a great showcase of their respective technologies and data:

We will host the system at the primary urls and also at - at the moment , these redirect to, but as we switch things over they will point to servers provisioned by our team, so please start using these new urls for future access, although the original urls will continue to work into the future.

One of the more complicated things to transfer is the user accounts system - we can't simply transfer them over - and so have a plan to mail batches of users once a new sign-on system is in place in order to invite them to sign up to the new user account system. If you are not currently a registered user, please sign up with the current system, and we'll invite you to transfer over to our sign-on system once things are ready.

The EMBL-EBI has a broad range of life-science chemistry resources, and we integrate across chemistry related content using a chemical structure integration system call UniChem. In overview the EMBL-EBI chemistry resources include the following.

The future? - well the future is exciting, and we have lots of ideas to actively develop the SureChEMBL system. To be clear though, doing this will rely on us getting funding, and we're working hard on this. Some of the ideas we have for SureChEMBL include:
  • Put SureChEMBL chemical content into UniChem
  • Add sequence searching
  • Add disease term, animal model, etc. indexing
  • Development of community KNIME nodes
  • Add links to/from Europe PMC
  • Ligand Ensemble-based mapping of ChEMBL literature to patents
  • Refactor interface for EMBL look and feel
  • Extend image extraction retrospectively from 2006 using spot priced compute from AWS
  • Provide weekly/monthly feed of patent structures to PubChem
  • Add chemical structure tagging & search to full text content of Europe PMC
But one of the first things we plan to do is index genes and targets (in collaboration with local SME SciBite) and provide an RDF form of the data and REST web services as part of the IMI OpenPHACTS project.

In the new year, we will run a webinar on SureChEMBL (which we will announce here), but in the mean-time we're very happy to take questions on the SureChEMBL support email address surechembl-help (at)


Friday, 29 November 2013

A call for new MMV Malaria Box screening data depositions

Last year, MMV released the MMV Malaria Box, a physical set of 400 probe- and drug-like compounds with confirmed anti-malarial activity. The 'Box' has been since distributed to a large number of academic labs around the world, where the compounds are screened against other plasmodia strains and pathogens such as schistosoma and mTB. The assay results have started coming back in the form of data depositions and, we, as MMV partners, are doing our best to integrate them with both the malaria-data database, as well as the main ChEMBL one. Recent examples of such MMV Malaria Box screening data depositions include:

In addition, we curate and integrate the bioactivity data produced by the excellent Open Source Malaria project.

The value of sharing screening data openly, especially in the field of NTD basic research, could not be emphasised more,as it:
  • minimises the duplication of effort among labs
  • accelerates research outcomes
  • leads to more informed decisions
  • fosters synergies and collaborations among researchers
  • shifts the focus of competition to between ideas as opposed to data access rights
Therefore, we would like to encourage you to deposit your NTD/MMV box screening data, both positive AND negative, regardless of whether they have been published or not, to ChEMBL. We will make sure that your data are appropriately integrated, searchable and downloadable with their provenance visible and properly acknowledged. More importantly, we will make sure your data are open and freely shareable by everyone. 

If you would like to deposit your data here or enquire further, please get in contact.

The ChEMBL Team

Thursday, 21 November 2013

Paper: myChEMBL - a virtual machine implementation of open data and cheminformatic tools

We have just had a paper published in Bioinformatics on myChEMBL - the Linux VM that contains a fully functional version of the ChEMBL database. The paper is here.

myChEMBL is available for download at:

A warning, it is a fairly big download (ca. 18GB, so try and do this over a fast stable connection)

Source code is available here:

%T myChEMBL: A virtual machine implementation of open data andcheminformatics tools
%J Bioinformatics
%D 2013
%O DOI:10.1093/bioinformatics/btt666
%A M. Davies
%A G. Papadatos
%A F. Atkinson
%A J.P. Overington

Job: Chemoinformatician at the Karolinska

Some of our collaborators at the Karolinska have a great job available - the advert is here.

DivisionChemical Biology Consortium Sweden (CBCS) are looking for a highly motivated and talented Cheminformatics Scientist to support and coordinate a wide scope of research informatics applications and data analysis at our Stockholm facilities.

DutiesThe desired candidate will have a demonstrated track record in managing large volumes of scientific data in support of basic research and/or drug discovery projects and should have significant experience with in-house and commercial software solutions that facilitate data capture, analysis and visualization in small molecule research and drug design. Responsibilities include:
  • Evaluation and implementation of a nationally encompassing chemoinformatics system for the SciLifeLab community.
  • Maintainance, configuration, monitoring, and/or troubleshooting scientific applications and underlying software. 
  • Partnering and interaction with software vendors and external professional service staff to resolve issues and implement enhancements.
  • Assist scientists from multiple disciplines with data capture, analysis, visualization.

Wednesday, 20 November 2013

Competition Time - Teach-Discover-Treat 2014

Teach-Discover-Treat (TDT) is excited to announce our 2014 Competition. We have four exciting challenges that focus on developing and disseminating computational workflows for drug discovery of neglected diseases with a premium on reproducibility.

Three cash prizes - plus partial reimbursement of travel - will be awarded! Winners are required to present their work at the TDT Award symposium during the Fall 2014 ACS National Meeting in San Francisco, California.

Create and submit computational workflows that inspire drug discovery activities using freely available software tools. Detailed informationabout the 2014 Competition can be found here:

Submissions deadline is February 3, 2014.

The TDT Steering Committee

Hanneke Jansen, Rommie Amaro, Jane Tseng, Wendy Cornell, Patrick Walters and Emilio Xavier Esposito


Friday, 15 November 2013

New Drug Approvals 2013 - Pt. XIX - Ibrutinib (ImbruvicaTM)

ATC Code:
Wikipedia: Ibrutinib

On November 13, 2013, the FDA approved Ibrutinib (ImbruvicaTM) for the treatment of patients with mantle cell lymphoma (MCL) who have received at least one prior therapy. MCL is a subtype of B-cell lymphoma and accounts for 6% of non-Hodgkin's lymphoma cases. In an open-label, multi-center, single-arm trial of 111 previously treated patients, Ibrutinib showed a 65.8% response rate.

Ibrutinib is an irreversible inhibitor of the Tyrosine-protein kinase BTK (Uniprot:Q06187; ChEMBL:CHEMBL5251; canSAR target synopsis) and is the first approved targeted BTK inhibitor. It forms a covalent bond with a cysteine residue via a Michael acceptor mechanism, in the BTK active site, leading to inhibition of BTK enzymatic activity

Ibrutinib (ChEMBL:CHEMBL1873475; canSAR drug synopsis; also known as CRA-032765 and PCI-32765) has the formula C25H24N6O2 and a molecular weight 440.50. It is absorbed after oral administration with a median Tmax of 1-2 hours. After administration of 560 mg dose, the observed AUC is 953 ± 705 ng⋅h/mL. The apparent volume of distribution at steady state (Vd,ss/F) is approximately 10000 L and the half-life is 4 to 6 hours.

ImbruvicaTM is produced by Pharmacyclics, Inc.

The full Prescribing Information is here

New Drug Approvals 2013 - Pt. XVIII - Obinutuzumab (GazyvaTM)

ATC Code: L01XC15
Wikipedia: Obinutuzumab

On November 1, 2013 the FDA approved obinutuzumab (GazyvaTM) for use in combination with chlorambucil (a nitrogen mustard alkylating agent) for the treatment of patients with previously untreated chronic lymphocytic leukemia (CLL). CLL is the most common type of Leukaemia accounting for 35% of all reported Leukaemias (See CRUK CLL page). In a randomized three-arm clinical study, the combination of obinutuzumab (in combination with chlorambucil) improved the progression-free survival (PFS) of patients to 23.0 months compared to 11.1 months for chlorambucil alone.

Obinutuzumab (CHEMBL1743048) is a humanized anti-CD20 monoclonal antibody of ca. 150 kDa molecular weight. Its target, the B-lymphicyte antigen CD20, is the product of the gene MS4A1 (Uniprot: P11836; ChEMBL: CHEMBL2058; canSAR target synopsis. The CD20 antigen is expressed on the surface of pre B- and mature B-lymphocytes. Obinutuzumab mediates B-cell lysis through three main routes:
  • Engagement of immune effector cells, resulting in antibody-dependent cellular cytotoxicity and antibody-dependent cellular phagocytosis
  • Direct activation of intracellular death signaling pathways
  • Activation of the complement cascade.

The geometric mean (CV%) volume of distribution of obinutuzumab at steady state is approximately 3.8L. The terminal clearance is 0.09 (46%) L/day and the terminal half-life is ~28.4 days.

Obinutuzumab has been issued with a boxed warning because of the following observed events: Reactivation of Hepatitis B Virus (HBV), in some cases resulting in fulminant hepatitis, hepatic failure, and death; and causing Progressive Multifocal Leukoencephalopathy (PML) resulting in death.

GazyvaTM is produced by Genentech, Inc. The full Prescribing Information is here.

Thursday, 14 November 2013

New ChEMBL-NTD Depositions

We are very pleased to announce the release of two new datasets on the ChEMBL-NTD portal.

The first dataset is provided by the Drug for Neglected Diseases initiative (DNDi) and is focused on the selection and optimization of hits from a high-throughput phenotypic screen against Trypanosoma cruzi. The paper describing the dataset in more detail can be accessed here and the data can be downloaded from here.
The second dataset from the DeRisi Lab UCSF and is focused on the screening of MMVs Malaria Box compounds in Plasmodium falciparum, to understand if anti-malarial compounds target the apicoplast organelle. More details about the dataset can be found here and the data can be downloaded from here.  

Both datasets will be loaded into the next version of ChEMBL, which will be due out early next year.

The ChEMBL - Neglected Tropical Disease portal is a repository for Open Access primary screening and medicinal chemistry data directed at neglected diseases. If you would like to deposit your data here please get in contact.

The ChEMBL Team

Tuesday, 12 November 2013

RDKit and Raphael.js

The ChEMBL group had the honour of hosting the second RDKit UGM. It was a great way to catch up with the RDKit community, find out about what they are working and learn about new features the toolkit offers. We gave two talks during the meeting, so if you want to know how Clippy can make interacting with different chemical formats on your desktop easier, go here, and if you want to learn about wrapping RDKit up in a RESTful Web Service a.k.a. Beaker (to be described in future blog post), go here. Many discussions about new features RDKit could offer were had throughout the meeting and one which caught my attention was support for plotting compound images on HTML5 Canvas.

Unable to participate in a hackathon held on the final day, I set about hosting my own small hackathon during the weekend (only 1 attendee). The result of this weekend coding effect was a pull request made against RDKit github repo, introducing the new class called JSONCanvas.

Technical Details

As a general rule of the past, the model for generating image relies on the server to sending a binary representation of the compound (e.g. .png, .jpeg) to the client. With advances in browser technologies, it is now feasible to rely on the client to generate the graphical representation of the compound as it now has access to many methods, which allows it to handle geometrical primitives. It can decide if those primitives should be rendered as SVG, VML or even HTML5 Canvas (check out  Kinetic.js for HTML5 canvas rendering, as it knows how to draw some core shapes on canvas). 

My solution uses Raphael.js - a JavaScript library for drawing vector graphics in the browser. For displaying the graphic is uses SVG on browsers that support this format. On older browsers it will fail over to VML. In the library documentation we can find a very interesting method called Paper.add(). This method accepts JSON containing an array of geometrical objects (such as circle, rectangle, path) to be displayed and returns a handle for manipulating (moving, rotating, scaling) the object as a whole. This means that if we could create a JSON object, which uses shapes to represent a chemical compound, we could draw it or manipulate the compound directly. The new JSONCanvas class produces the previously described JSON object for any* given RDKit compound.

(*I am sure we might find a couple of exceptions)

But why?

1. Cost - reduced server processing required to raster image and often third-party drawing libraries are also required.

2. Bandwidth - reduced bandwidth required to transfer JSON representation of compounds. Also, as it  is text-based you can employ further compression (by configuring your server to send gziped JSON which most modern browsers understand) or using AMF.

3. Accuracy - improved scaling quality made possible with vector graphics.

4. Interactivity - compounds rendered using JSON on the client side can handle standard events such as click, hover, etc. Complex operations (animating, sorting, dragging,...), can also be applied to these objects.


As an example usage of this technique please look at our chemical game. To give you some idea of scale and performance the game loads 1000 compounds when page first loads. If you want to see raw example please explore source of my demo page. Other examples can involve:

1.  Online compound cloud (similar to tag cloud but with compound images instead of words). Such a cloud can be used to visualise compound similarity.

2. Compound stream - substructure search can sometimes return very large number of results. Such results can be represented as pseudo-infinite stream of compounds - only small portion of results is presented on the screen but scrolling down causes more results to be rendered when older one are discarded.

How can I use it?

1. You can download my fork or RDKit containing all relevant changes.

2. Today Greg Landrum, RDKit creator made his own branch containing modified version of the original pull request, so hopefully this is on it's way to be accepted in master branch in future.

As a group we are happy to participate in such a great open source library!


Sunday, 10 November 2013

USAN Watch: September 2013

The USANs for September, 2013 have recently been published. We actually missed September, due to switch over in service for the INNs, but now they're here.

USAN Research Code InChIKey (Parent) Drug Class Therapeutic class Target
aducanumab BIIB-037 n/a monoclonal antibody therapeutic beta amyloid
aptorsen-sodium OGX-427 n/a oligonucleotide therapeutic HSP27
asfotase-alfa ALXN-1215, ENB-0040 n/a enzyme therapeutic n/a
batefenterolbatefenterol-succinate GSK-961081A URWYQGVSPQJGGB-DHUJRADRSA-N synthetic small molecule therapeutic Muscarinic receptors, B2 receptor
bococizumab RN-316, PF-04950615 n/a monoclonal antibody therapeutic PC9
dactolisibdactolisib-tosylate NVP-BEZ235-NX

JOGKUKXHTYWRGZ-UHFFFAOYSA-N synthetic small molecule therapeutic MTOR, PI3K
deldeprevirdeldeprevir-sodium ACH-0142684, ACH-2684 UDMJANYPQWEDFT-ZAWFUYGJSA-N synthetic small molecule therapeutic HCV NS3 PR
etiguanfacine SSP-1871 NWKJFUNUXVXYGE-UHFFFAOYSA-N synthetic small molecule therapeutic
faldaprevirfaldaprevir-sodium BI-201335
synthetic small molecule therapeutic HCV NS3 PR
fedratinib SAR-302503; TG-101348 JOOXLOJCABQBSG-UHFFFAOYSA-N synthetic small molecule therapeutic FLT3, JAK2
grazoprevir n/a n/a synthetic small molecule therapeutic
irinotecan-sucrosofate MM-398, PEP-02 n/a natural product derived small molecule therapeutic topo 1
luspatercept ACE-536 n/a protein therapeutic TGF-B family
mavoglurant AFQ-056 ZFPZEYHRWGMJCV-ZHALLVOQSA-N synthetic small molecule therapeutic mGluR5
otlertuzumab TRU-016 n/a monoclonal antibody therapeutic CD37
ralpancizumab RN317, PF-05335810 n/a monoclonal antibody therapeutic PC9
romyelocel-l CLT-008 n/a cellular therapy therapeutic n/a
roxadustat FG-4592; ASP-1517 YOZBGTLTNGAVFU-UHFFFAOYSA-N synthetic small molecule therapeutic prolyl hydoxylase
simtuzumab AB-0024; GS-6624 n/a monoclonal antibody therapeutic LOXL2
sucroferric-oxyhydroxide PA-21 n/a inorganic sequestering agent n/a
tecemotide BLP-25 n/a peptide vaccine peptide vaccine n/a

Saturday, 9 November 2013

Paper: The ChEMBL bioactivity database: an update

An update to what has happen to the Wellcome Trust funded database ChEMBL over the past few years has just been published - it seems odd, that we've been around long enough to achieve our 2nd NAR Database paper - so much more to do though! This paper contains features and content up to ChEMBL 17.

This could put you in a difficult position which NAR paper to cite in your own publications using ChEMBL; so we suggest both! ;)

Oh, and it's Open Access, of course.

%J Nucleic Acids Research
%D 2013
%P 1–8 
%O doi:10.1093/nar/gkt1031
%T The ChEMBL bioactivity database: an update
%A A.P. Bento
%A A. Gaulton
%A Anne Hersey
%A L.J. Bellis,
%A J. Chambers
%A M. Davies
%A F.A. Krueger
%A Y. Light
%A L. Mak
%A S. McGlinchey
%A M. Nowotka
%A G. Papadatos 
%A R. Santos
%A J.P. Overington


Tuesday, 5 November 2013

Paper: The Functional Therapeutic Chemical Classification System

Here's an Open Access paper from Samuel in the group.

Drug repositioning is the discovery of new indications for compounds that have already been approved and used in a clinical setting. Recently, some computational approaches have been suggested to unveil new opportunities in a systematic fashion, by taking into consideration gene expression signatures or chemical features for instance. We present here a novel method based on knowledge integration using semantic technologies, to capture the functional role of approved chemical compounds.

In order to computationally generate repositioning hypotheses, we used the Web Ontology Language (OWL) to formally define the semantics of over 20,000 terms with axioms to correctly denote various modes of action (MoA). Based on an integration of public data, we have automatically assigned over a thousand of approved drugs into these MoA categories. The resulting new research resource is called the Functional Therapeutic Chemical Classification System (FTC) and was further evaluated against the content of the traditional Anatomical Therapeutic Chemical Classification System (ATC). We illustrate how the new classification can be used to generate drug repurposing hypotheses, using Alzheimers disease as a use-case.

A web application built on the top of the resource is freely available at The source code of the project is available at

%T The Functional Therapeutic Chemical Classification System
%D 2013
%J Bioinformatics
%A S. Croset
%A J.P. Overington
%A D. Rebholz-Schuhmann
%O Open Access

Sunday, 3 November 2013

Magic methyls and magic carpets

A few days ago, there was this post by Derek Lowe, reviewing a recent paper on magic methyls and their occurrence and impact in medicinal chemistry practice. They're called 'magic' because, although methyls are relatively insignificant in terms of size, polarity or lipophilicity, the addition of one in a compound can sometimes have a dramatic impact in its potency - much more that it would be attributed to any simple desolvation effects.

More generally, the 'magic methyl' phenomenon pops up in discussions about the validity of the molecular similarity principle, descriptors, QSAR - almost everything in the applied Chemoinformatics field - and belongs to the general class of 'activity cliffs'. 

Methylation is a chemical transformation, and transformations along with their impact on a property of choice can be easily mined and studied using the so-called Matched Molecular Pairs analysis (MMPA). We already have a comprehensive database of all the matched pairs and transformations in ChEMBL, so it was relatively straightforward to extract all the methylations (H>>CH3) recorded in ChEMBL_17 and analyse their impact in binding affinity. (b.t.w., MMPs are coming to the ChEMBL interface soon, so look out for this feature if you are interested in this area).

So, in more detail, I extracted all the H>>CH3 pairs and joined them with their pActivities (Ki, IC50, EC50) against human proteins as reported in the literature (our data validity flags were quite useful in this case). The trick here is to only consider molecule pairs tested against the same assay, so that their respective activities are directly comparable and one can safely subtract one from the other.

I ended up with 37,771 data points - much more than another recent publication that looked at this. Here's how the histogram of Delta pActivity (log units) looks like:

As you can see, the scale tilts slightly to the left of zero, meaning that methylation has overall neutral to negative effect on binding affinity. This is not the first time people see this. There are indeed, however, several examples (~2.3K out of 37.8K, to be exact) of magic methyls with more than 10-fold increase in activity. More about this later.

Some of you will ask: 'OK, but what about the context? - methylation of a carbon, nitrogen or oxygen is not the same'. You're right, it's not. So I trellised the above plot by a perception of context - i.e. whether the methylation happens next to an aromatic/aliphatic C or N or next to an oxygen:
The same trend, more or less, is observed with the exception of the aromatic carbon context, whereby methylation seems to have more favourable effect that expected by the overall distribution. Perhaps that could be explained by introducing torsional and planarity changes, etc. For a more thorough explanation of this, see here

Here are some examples of 'magic methyls' in the literature:

The take home message is: Magic methyls, unlike magic carpets, do exist but there are also equally as many, or even more, 'nasty' methyls. However, both of them are just a rather small minority compared to the 'boring' methyls - i.e. methyls with minimal or zero impact on potency.

It's just human nature to remember the few exceptions and outliers and forget the vast evidence to the contrary. However, isolating and understanding such edge cases and black swans is what could make the difference in drug discovery. 


New Drug Approvals 2013 - Pt. XVII - Flutemetamol F18 (VizamylTM)

ATC Code: V09AX04

On October 25th, the FDA approved Flutemetamol F18 (Tradename: Vizamyl; Research Code: [18F]AH110690 ), a radioactive diagnostic agent, for intravenous (i.v.) use in Positron Emission Tomography (PET) imaging of the brain in adult patients with cognitive impairment, who are being evaluated for Alzheimer’s disease (AD) and dementia.

Alzheimer's disease is a non-treatable, progressively worsening and fatal disease, characterised by a decrease in cognitive functions, such as memory, and is usually associated with an accumulation of β amyloid (Uniprot: P05067) plaques in several brain regions. These deposits are believed to be responsible for cellular damage and ultimately cell death.

Flutemetamol F18 is the second approved diagnostic drug to estimate β-amyloid neuritic plaque density, after the approval of Florbetapir F18 in 2012. Like Florbetapir F18, Flutemetamol F18 binds to β amyloid plaques in the brain where the F-18 isotope produces a positron signal that can be detected by a PET scanner. The advantages of this compound over its predecessor are: exposure to a lower dose of radiation; and more time for PET image acquisition (20 vs. 10 minutes). In in vitro binding studies using postmortem human brain homogenates containing fibrillar β amyloid, the dissociation constant (Kd) for flutemetamol was 6.7 nM.

It is worth mentioning, that a positive scan, indicating the presence of β amyloid deposits, it's not enough to diagnose a patient with Alzheimer's disease, since these protein deposits can also be present in patients with other types of dementia, or in elderly people without any neurological disease. However, a negative scan, where little or none β-amyloid plaques can be detected, indicates that the cause for dementia is probably not due to Alzheimer's disease.

Flutemetamol F18 (IUPAC Name: 2-[3-fluoranyl-4-(methylamino)phenyl]-1,3-benzothiazol-6-ol; Canonical smiles: CNc1ccc(cc1[18F])c2nc3ccc(O)cc3s2 ; ChEMBL: CHEMBL2042122; PubChem: 15950376; ChemSpider: 13092196; Standard InChI Key: VVECGOCJFKTUAX-HUYCHCPVSA-N) is a synthetic small molecule with a radioactive isotope of fluorine (18F), with a molecular weight of 274.3 Da, 3 hydrogen bond acceptors, 2 hydrogen bond donors, and has an ALogP of 3.61. The compound is therefore fully compliant with the rule of five.

Flutemetamol F18 is available as a radioactive solution for intravenous injection and the recommended imaging dose is 185 megabecquerels (MBq) [5 millicuries(mCi)] in a total volume of 10 mL or less. Following intravenous injection, the plasma concentrations declines by approximately 75% in the first 20 minutes post-injection, and by approximately 90% in the first 180 minutes. Flutemetamol F18 metabolites are primarily excreted via the hepatobiliary (52%) and the renal system (37%).

The license holder for VizamylTM is GE Healthcare, and the full prescribing information can be found here.

Monday, 28 October 2013

EU-OPENSCREEN 3rd Stakeholder Meeting, Oslo, Norway

Dear future user, partner, collaborator or supporter!

The ESFRI project EU-OPENSCREEN is an academic infrastructure initiative in Chemical Biology to serve your research needs. We are currently preparing the implementation of this pan-European infrastructure of open screening platforms to support basic and applied research. EU-OPENSCREEN will offer access to a unique compound library representing the know-how of European chemists, to a broad range of cutting-edge screening technologies, to valuable tool compounds for research, and to the knowledge that emerges from validated output of hundreds of screens stored and made publically available in a central database.

We cordially invite you to join us in Oslo for an exciting science day where we inform about the progress of the project and the planned services with reports on the design of the joint European Compound Library, the screening services and the database. In particular, we would like to share with you your own experiences from academic screening projects and thus invite you to present your projects as poster. From these, highlight projects will be selected for oral presentation.

See for more details.

Sunday, 27 October 2013

Competition Time - Win a Raspberry Pi with ChEMBL - chempi

Here's a free to enter competition for a brand new, fully working raspberry pi running the brand new chempi implementation. It includes everything you need to get started at home with ChEMBL - a sort of in silico Breaking Bad maybe (hopefully not, thinking about it). It includes everything you need, with the exception of a power supply and ethernet cable.

We have run out of our creative juices, and cannot think of a suitable poem to mark the release of chempi - so the competition is for you to finish a limerick for us, starting with the line.

There once was a hacker with chempi....

Entries must be posted in the comments section. Obscene or defamatory entries will be removed (all comments are moderated, so it may take a few hours for you entry to appear, so do not repost twenty times!). We haven't really decided how to pronounce chempi (with a hard 'k' start or a soft 'sh' start, just as with ChEMBL, both are used in the wild; and also does it rhyme with scampi, or the irrational number pi?). All entries will be assumed to be made under CC-BY licensing. The competition will be open until noon GMT on Sunday 10th November 2013.

Entires will be judged for compliance to a standard limerick format, outrageous rhymes with chempi, gratuitous chemistry references, and finally humour.

The judges decision (i.e. mine) is final. The winning entry will be published on the ChEMBL-og.


PS Before I get asked, the competition is not open to members of the ChEMBL group, or extended family members of the ChEMBL group.

Tastypie & Chempi

One of the immediate consequences of refactoring our webservices using Django, Tastypie and related approaches (as described here) is that we can run them on almost any database backend. Django abstracts communication with database and using custom QueryManagers we were able to implement chemisty-specific opererations, such as substructure and similarity search in a database agnostic manner.

This means, that if we want, we can use only Open Source components (such as Postgres and RDKit), or elect to use optimised commercially sourced software as appropriate. However, what if we go one step further and try to use Open Hardware as well? This is exactly what we've just done! We managed to install full ChEMBL 17 on raspbery pi.

Some frequently asked questions (at lease those that have been asked internally) and technical details are below:

1. How much space does it take?

12 Gb, including OS, data and all relevant software. Unfortunately we a used 32 Gb SD card so this is size if you would like to use our cloned disk image.

EDIT: Compressed image takes 4.13 Gb.

2. What OS is it running?

Raspbian, free operating system based on Debian.

3. Is it slow?

We haven't make any benchmarks yet. Obviously it's slower than our online web services - but then it's a lot cheaper. On the other hand, performing some sample requests we can say that performance is certainly acceptable; and there is a lot room for improvements - raspberry pis can be easily overclocked from 700 MHz to 1GHz and according to some benchmarks this can give rise to doubling of application speed in some cases. The SD card we used is not the fastest one as well. Finally, all caching is disabled because we wanted to save disk space but using database caching from Django caching framework should give further major improvements - so maybe use the 32 Gb image after all.

Types of request that chempi can be slower on are:

 - Image generation, but if we replace image with JSON from which image can be generated using HTML5 canvas on the client side (the way we generated images in our game) it can be much faster. More about this topic in future blog post.
- Queries using aggregate functions such as COUNT (it seems that we need to optimise our postgres db by adding some more indexes).
- Substructure and similarity search - again, caching, over-clocking and some database and cartridge (choosing faster fingerprints) optimization should solve all the problems. "Premature optimization is a root of all evil", so we first wanted to have a proof of concept that just works, not necessarily works super fast.

4. Can I make my own chempi?

Yes, we are planning to share our SD card image, we will probably use BitTorrent protocol to do this due to image size, and some issues we have faced with distribution of the myChEMBL. We do remember that not everyone has mega-fast broadband!

5. Is chempi useful at all?

Although we think it is interesting as a proof of concept having chemical database on such small and open source hardware, we do think this may have some interesting future real-world applications:

 - plugging our chempi to local network makes it immediately accessible to other computers. So this is a zero configuration demonstration of ChEMBL.
- analogically to the thesis included in this paper, it can encourage cheminformatics education on low cost ARM hardware.
- raspberry can be easily enhanced with camera to perform image recognition. This, combined with software like OSRA can give ability so scan compound images and search them in database.
- adding some e-ink display (for example, jailbroken Kindle?) can produce very interesting small machine...

6. What are some of the technical details?

To deploy our webservices (which are just another Django application) we've used Gunicorn as a server, which in turn connects to NGINX via standard unix pipe. To make it work as a deamon and launch on machine startup, we've used Supervisor. We believe this is ideal way to deploy Django not only on raspberry but on all production machines to if you like to run chembl webservices locally in your company/academia we suggest to do it this way.


Saturday, 26 October 2013

Usan Watch: October 2013

The USANs for October 2013 have recently been published.

We have modified the sourcing of this data - using the new ChEMBL API to automatically parse the documents, extract and validate the mol files for the compounds. So in future, these reports should be more timely, complete and fun!

USAN Research Code InChIKey (Parent) Drug Class Therapeutic class Target
AF-802; CH-5424802

KDGFLJKFZUIJMX-UHFFFAOYSA-N synthetic small molecule therapeutic ALK
GDC-0980.1, G-038390, G-038390.1, RG-7422

YOVVNQKCSKSHKT-HNNXBMFYSA-N synthetic small molecule therapeutic MTOR,PI3K
cimaglermin-alfa GGF2, rhGGF2

n/a protein therapeutic ErbB
VX-509, VRT-831509

ASUGUQWIHMTFJL-QGZVFWFLSA-N synthetic small molecule therapeutic JAK3
A-3309; AZD-7806

GDC-0068; RG-7440

GRZXWCHAXNAUHY-NSISKUIASA-N synthetic small molecule therapeutic AKT

n/a peptide therapeutic GLP1R
MDX-1338, BMS-936564

n/a monoclonal antibody therapeutic CXCR4

Friday, 25 October 2013

New Drug Approvals 2013 - Pt. XVII - Macitentan (Opsumit ®)

ATC Code: C02KX (incomplete)
Wikipedia: Macitentan

On October 13th the FDA approved Macitentan (trade name Opsumit ®) for the treatment of pulmonary arterial hypertension (PAH). Macitentan is an endothelin receptor antagonist (with affinities to both Endothelin ET-A (ETA) and Endothelin ET-B (ETB) receptor subtypes, similar in mechanism of action to the previously licensed drug Bosentan, CHEMBLID957).

The Endothelin receptor ET-A (ETA, CHEMBLID252 ; Uniprot P25101) and Endothelin receptor ET-B (ETB, CHEMBLID1785 ; Uniprot P24530) receptors mediate a number of physiological effects via the natural peptide agonist Endothelin-1 (ET1 , CHEMBL437472 ; Uniprot P05305). In addition to normal roles in supporting homeostasis, these effects can include pathologies such as inflammation, vasoconstriction, fibrosis and hypertrophy.

Macitentan acts as an antagonist for both receptors with both a high affinity and long residence time in human pulmonary arterial smooth muscle cells. Hence it counteracts vasoconstriction and relieves hypertension. One of the metabolites of Macitentan is also pharmacologically active at the ET receptors and is estimated to be about 20% as potent as the parent drug in vitro

Macitentan (CHEMBL2103873 ; Pubchem : 16004692 ) is a small molecule drug with a molecular weight of 588.3 Da, an AlogP of 3.67, 11 rotatable bonds, and 1 rule of 5 violation.

Canonical SMILES : CCCNS(=O)(=O)Nc1ncnc(OCCOc2ncc(Br)cn2)c1c3ccc(Br)cc3
InChi: InChI=1S/C19H20Br2N6O4S/c1-2-7-26-32(28,29)27-17-16(13-3-5-14(20)6-4-13)18(25-12-24-17)30-8-9-31-19-22-10-15(21)11-23-19/h3-6,10-12,26H,2,7-9H2,1H3,(H,24,25,27)

10 mg once daily. Doses higher than 10 mg once daily have not been studied in patients with PAH and are not recommended.

Metabolism and Elimination 
Following oral administration, the apparent elimination half-lives of macitentan and its active metabolite are approximately 16 hours and 48 hours, respectively. Macitentan is metabolized primarily by oxidative depropylation of the sulfamide to form the pharmacologically active metabolite. This reaction is dependent on the cytochrome P450 (CYP) system, mainly CYP3A4 with a minor contribution of CYP2C19. It is interesting to note the presence of bromine atoms in two of the aryl rings, typically a lighter halogen, typically fluorine is used to block oxidative P450-mediated metabolism at these exposed aromatic positions.

At steady state in PAH patients, the systemic exposure to the active metabolite is 3-times the exposure to macitentan and is expected to contribute approximately 40% of the total pharmacologic activity. In a study in healthy subjects with radiolabeled macitentan, approximately 50% of radioactive drug material was eliminated in urine but none was in the form of unchanged drug or the active metabolite. About 24% of the radioactive drug material was recovered from feces.

Macitentan may cause fetal harm when administered to a pregnant woman. Macitentan is contraindicated in females who are pregnant.

Other ERAs have caused elevations of aminotransferases, hepatotoxicity, and liver failure. Obtain liver enzyme tests prior to initiation of Macitentan and repeat during treatment as clinically indicated.

Hemoglobin Decrease 
Decreases in hemoglobin concentration and hematocrit have occurred following administration
of other ERAs and were observed in clinical studies with Macitentan. These decreases occurred
early and stabilized thereafter Initiation of Macitentan is not recommended in patients with severe anemia. Measure hemoglobin prior to initiation of treatment and repeat during treatment as clinically indicated.

Strong CYP3A4 Inducers / Inhibitors
Strong inducers of CYP3A4 such as rifampin significantly reduce macitentan exposure whereas concomitant use of strong CYP3A4 inhibitors like ketoconazole approximately double macitentan exposure. Many HIV drugs like ritonavir (CHEMBL163) are strong inhibitors of CYP3A4.

The license holder is Actelion Pharmaceuticals US the full prescribing information can be found here.

Tuesday, 22 October 2013

ChEMBL Web Service Update 3: Image Rendering Changes

If you are a follower of this blog you will have seen some earlier posts (here and here) providing details on changes we are making to our Web Services. I recommend reviewing the previous posts, but in summary we have setup a temporary base URL to allow existing ChEMBL Web Service users to test the new ChEMBL API powered Web Services. The new temporary base URL is:

As well as providing users with all existing functionality we have also added a couple of extra features, one of which is improved molecule rendering options. The current live Web Services provides the following REST call to allow you to get a molecule image: 


You are able to provide a dimension argument (pixels) to change the size of the image:

The image quality has deteriorated, this is because the image returned is simply re-sized version of the first image. The new ChEMBL API powered Web Services addresses this issue by dynamically generating the images, using either the RDKit or the Indgio chemistry toolkits (defaults to RDKit). So, to get an image using the new services, you just need to add '2' to the base URL:

When using the dimensions argument with the new Web Services you now get the following improved smaller image:

The coordinates used to generate the image are based on those found in the ChEMBL192 molfile. All current ChEMBL images are produced using Pipeline Pilot, which is currently setup to ignore the molfile coordinates and layout molecule how it sees best. This explains why the layout of the first two images are different to the second two. We can get the new Web Services to ignore coordinates and get the chemical toolkit to layout molecule coordinates how it sees best using the ignoreCoords=1 argument:

If you would prefer to use Indigo to generate your ChEMBL molecule images you can use the engine argument:

Finally, it is also possible to use any combination of the 3 arguments mentioned above:

In summary, the new Web Service base URL extends the the current image generating functionality, by improving the dimensions argument and introducing the ignoreCoords and engine arguments. More details in table below:

Argument Name Argument Description Argument Options Default
dimensions Size of image in pixels 1-500 500
ignoreCoords Choose to use or ignore coordinates in ChEMBL molfiles 1 or 0 0 (Use ChEMBL molfile coordinates)
engine Chemical toolkit used to generate image RDKit or indigo RDKit

We hope you find these image  rendering changes useful and if you have any questions please let us know via mail to "chembl-help at" if you have any questions.

The ChEMBL Team