ChEMBL Resources

The SARfaris: GPCR, Kinase, ADME

Tuesday, 30 October 2012

Wellcome Trust Courses - Computational Resources For Drug Discovery 2013

Those of you who went on the course we ran this year will know how much fun it was - and from our perspective we're gonna keep on doing it till we get it right! So, once more, there is another chance to attend the course in 2013 - December 9 to 13th 2013 to be precise.

So if you are interested, pencil the dates in your diaries now, and set an automatic alarm for four months before, and check out the full course details then.

Of course, there are lots of other excellent courses in the same series, and the poster is available for download to display on your office wall here.

The First Rule of Security Club is that you do not talk about Security Club

We worry about data security and privacy, a lot. I fret and sweat over this, and it is one of the things (alongside being late with EU reports) that genuinely keeps me awake at night, and that you can never know too much about (again a bit like the EU). We have started to collect examples of security and data privacy issues and vulnerabilities in online chemistry-related resources. Firstly, to build a set of real world examples, and to establish best practice for our own developers. It also allows us to potentially create an environment in which security and privacy matters can be privately discussed without the world being unnecessarily alerted to them; allowing fixes to be made, and generally keep the online chemistry world a better safer place.

As would be expected for this sort of thing, the list will not be open, and not indexed in google (if it is right now, we’ve failed at step one!), so if you’re interested in joining the list, and your job involves the building/maintenance of online chemistry systems with a security/privacy responsibility, get in touch in the normal way.….

Clinical Development Candidate Annotatathon - July 2013

We are thinking of holding an annotatathon for clinical development stage compounds next July, here on campus at the EBI in Hinxton. At this event we will assign/curate efficacy targets for all the clinical stage compounds we have by then identified, simplifying the work by pre-clustering by chemical class/therapeutic area. Data generated during the event will be placed online immediately, and would of course be fully Open (none of this frustrating, online access only for us!).

If there is interest in taking part, and contributing to this effort, let me know! Depending on the level of interest, I may apply for funding to help with travel/accommodation. If you are interested in funding this we'd be delighted to help with this

Monday, 29 October 2012

PubMed² - Experimenting with biomedical literature for tablets and smart phones

We're still playing around with data visualisation, and the experiment of this week focuses on the scientific literature and is designed with tablet devices (such as the iPad or the Nexus 7) and smartphones in mind. The application is a re-thinking of PubMed's search interface and you can get to play with it here at

Let us know in the comments what you think.

Masters Project - Ion-channel structural pharmacology

We have a position in the group in the area of ion-channel structural pharmacology - mapping known ion-channel modulators to sequences and binding sites. This will be in partnership with Pfizer, and the role will involve time spent both at the EBI and at Pfizer's labs in the Cambridge UK area - so a great opportunity to pick up some industrial experience.

If you are interested, please get in touch by December 15th 2012, when we will shortlist candidates for interview.

Sunday, 28 October 2012

Random Notes on Open Drug Discovery/Data Sharing: Part 1

There are some fantastic initiatives in Open Drug Discovery going on at the moment. I for one, are convinced that we are on the cusp of a large structural change in drug discovery, and like at the beginning of all revolutions, the future is not clear, and we all a little bit excited and nervous at the same time. One of the commonly quoted benefits of an Open strategy is that it can avoid duplication, and if you avoid duplication, it means that you get to the goal, faster and cheaper (since other researchers can explore alternative approaches), and there is no repetition. There, you've just read it, and it's quite seductive isn't it?

I've never quite bought this "avoid duplication" argument for the following three reasons. (I should declare my political/philosophical hand here, I have a very deep rooted empathy with the concept of The Free Market. Not the goofy, fudged form that we've had in Western Economies for some time - but that really is a different story, for another time).

1) Scientists are not perfect and they mess things up now and then. The "no duplication" strategy places a lot of weight on the capabilities of a single group, who may not follow the best decision making, have the best approach to data analysis/design etc. There is a lot of discussion at the moment in the literature of the non-repeatability of key pharmacology data, to not have several parallel attempts at a problem seems a little rash given the probably high likelihood of individual failure. If an individual group has a likelihood of 0.6 of getting something done within a given time and given funding. Two groups (with the same likelihood of success/failure) in parallel have a probability of getting it done of 0.84. Simples.

2) Competition is well established to be one of the major drivers of rapid completion in almost all endeavours of life; if you have someone breathing down your neck, potentially scooping you on a paper, you think in a different way, and tend to stay focussed on the task in hand. Given a finite time to complete a piece of work with preplanned and coordinated deliverables, the work miraculously fills the time and funding available.

3) Who will take the decisions over non-duplication? Effectively saying you will not work on this compound series, and another group will, and will people abide with the decisions? We all know that grant committees are useless (unless we are on them of course), and without a lot of process transparency, things could rapidly descend into slow chaos and confusion.

However, I think the arguments for rapid data sharing are very very strong, primarily because they increase liquidity and transparency in the market, and allow market participants to take more rational decisions on the allocation of their resources (individual labs and funders). For me this is the biggest single reason for data sharing (i.e. it actually increases competition, not decreases it). The Free Market of Knowledge in Drug Discovery will drive participants to their best composite roles, based on their abilities.

Saturday, 27 October 2012

Paper: Cheminformatics - Communications of the ACM

Here is a review article on cheminformatics, written as an orientation piece for people from a computational sciences background.

%T Cheminformatics
%A J.K. Wegner
%A A. Sterling
%A R. Guha
%A A. Bender
%A J.-L. Faulon
%A J. Hastings
%A N. O'Boyle
%A J. Overington
%A H. Van Vlijmen
%A E. Willighagen 
%J Communications of the ACM
%V 55
%I 11
%P 65-75
%O DOI:10.1145/2366316.2366334

Thursday, 25 October 2012

ChEMBL - now with added DOIness

In order to provide ChEMBL users with a persistent and citable link to datasets that have been deposited in ChEMBL we have started registering DOIs (Digital Object Identifiers) for these datasets. Many of you will be familiar with the use of DOIs as identifiers for journal articles but they can be used for any document that you want to permanently identify and share with others. By doing this we are providing people with a way of citing a deposited dataset in exactly the same way as you would a scientific publication.

We are also hoping that by issuing DOIs for deposited data we will encourage people to contribute additional data to the ChEMBL database as the DOI will provide them with a permanent way to reference their contribution, for example by using the DOI in a subsequent publication.

At the moment we have DOIs for four of the deposited datasets in the ChEMBL database.  Two are results from screens on the GSK PKIS set and two are datasets measured as part of DNDi but we expect these to increase.  These datasets and their DOIs are shown below.

Compounds: GSK PKIS; Assays: Nanosyn kinase panel
Compounds: GSK PKIS; Assays: UNC Frye lab
Screening and optimization of specific chemical series against human African Trypanosomiasis (HAT)
Optimisation of fenarimol series for the treatment of Chagas disease

The DOIs can be resolved to the ChEMBL Document Report Card from the website

Open data for drug discovery: learning from the biological community

We've just co-authored with a collaborator from GSK an editorial on Open Data available here....

%T Open data for drug discovery: learning from the biological community.
%A A. Hersey
%A S. Senger
%A J.P. Overington
%J Future Medicinal Chemistry 
%D 2012
%I 10
%V 4
%P 1865-1867
%O DOI:10.4155/fmc.12.159

The picture of the Fifty Shades of Grey dog I found on the internet somewhere...

Sunday, 21 October 2012

Interest in a ChEMBL seminar in your lab in the South-East of the US next Spring?

I'm chairing a session at the ACS Spring meeting in New Orleans, in early April 2013 (the 7th thru 11th are the dates of the ACS meeting itself, but I'll probably be finished by the 9th) and am making a visit to Miami and Kentucky on the same trip. I can probably squeeze in two more lab visits if there is interest in a ChEMBL seminar (I would need to leave the US at the latest on Tuesday 16th April). I'll need to look into the practicalities of travel, and realistically they'll need to be in the South-East, but I'm pretty hardy in the air - I don't mind odd timed flights.

So if there is any interest, let me know.

Friday, 19 October 2012

Drug Approval Timeline Visualisation

We're playing around with some visualisation techniques at the moment for ChEMBL, with one of our interests being the display of timelines. Here is a little standalone visualisation of the timeline of FDA drug approvals, annotated with ATC codes, loaded with some toy data (so don't rely on it for structures/publications/analysis!).

Update: So the toolkit we use looks to be quite browser sensitive - we'll look into this, but by default, it looks like it doesn't render in Chrome.....

Tuesday, 9 October 2012

Brain-1.0 - Biomedical knowledge manipulation

The world of data informatics is seeing a cultural change, from a world of databases, such as Chembl, to one where data is more self-descriptive and ad hoc queryable - an evolution into knowledge-bases: The data will be organized around controlled dictionaries and ontologies (Semantic Web), more exposed to programmatic and web service infrastructures, and more robustly linked to other repositories (for a current example of a large nascent network of coordinated data repositories, see the ELIXIR project).

Brain is a library created to achieve such linkage: It can handle and query large biomedical knowledge-bases. The Brain library can also serve as a framework for users interested in Description Logic and Biology.

Website of the library:

Monday, 8 October 2012

Is there any interest in an RSS feed of ChEMBL?

Sometimes, when something great happens, you've just got to find out, or tell the world, straight away, just like those proud fellas in the picture above.

So, would there be an interest in an RSS feed of selected ChEMBL content? The sort of thing that is probably useful would be to have alerts for new bioactivity for an already existing ChEMBL compound, or for an already existing ChEMBL target/assay.

Any comments/mail gratefully received. If there is interest, we'll integrate this into our plans for next year.

Sunday, 7 October 2012

New Drug Approvals 2012 - Pt. XXI - Regorafenib (Stivarga®)

ATC code: L01XE21
Wikipedia: Regorafenib

On September 27th 2012 the FDA approved Stivarga (Regorafenib) for the treatment of metastatic colorectal cancer who have previously received chemotherapy, anti-EGFR or anti-VEGF therapy.
Colorectal or bowl cancer is one of the most common cancers in the western world, is the third most common cancer in the United States and the second most common cause of cancer deaths in the United Kingdom (according to CRUK). While Five-year survival rates of the primary cancer have been improving due to the availability of targeted therapy, they only extend to about 50%, however, the metastatic disease has poor prognosis. In clinical trials Regorafenib showed statistically significant improvement in survival of patients with metastatic disease as compared to the best standard care alone, with the median survival being extended from 5 to 6.4 months.

Regorafenib (research code: BAY-73-4506; chembl id :CHEMBL1946170 is a monohydrate and it has a molecular formula C21H15ClF4N4O3 • H2O and a molecular weight of 500.83 (for the mono hydrate) It has very poor water solubility.

Regorafenib is metabolized by CYP3A4 and UGT1A9. The main circulating metabolites of regorafenib measured at steady-state in human plasma are M-2 (N-oxide) and M-5 (N-oxide and N-desmethyl).

Regorafenib has been approved with a boxed warning due to severe hepatotoxicity which can be fatal.

At the clinically active dose, Regorafenib (and its primary active metabolites M-2 and M-5) are multi-kinase inhibitors showing in-vitro activity against RET (P07949), VEGFR1(FLT1, P17948), VEGFR2(KDR, P35968), VEGFR3(FLT4, P35916), KIT(P10721), PDGFR-alpha(P16234), PDGFR-beta(P09619), FGFR1(P11362), FGFR2(P21802), TIE2(Q02763), DDR2(Q16832), Trk2A, Eph2A , RAF1(P04049), BRAF, BRAFV600E(BRAF, P15056) , SAPK2(MAPK11, Q15759), PTK5(FRK, P42685), and ABL1(P00519).

Prescribing information is here.

Stivarga is marketed by Bayer

Saturday, 6 October 2012

Position in Computational Chemical Biology at Novartis in Cambridge MA

Novartis have a position available in Cambridge MA....

...for a highly motivated scientist in the in silico Lead Discovery group of the Center for Proteomic Chemistry platform. This is an exciting opportunity to perform cutting-edge computational research at their Cambridge USA site. The position will be responsible for the development of robust computational hypothesis with a variety of drug discovery project teams for the purposes of lead discovery, and will apply state of the art computational approaches to elucidate the biological profile of small molecules regarding their targets, off-targets and phenotypic outcome via an in-depth understanding of biological networks, systems biology, bioinformatics and cheminformatics.

Further details are available here.

Thursday, 4 October 2012

Looking for some help on a computational chemistry problem....

I have an interesting little conformational analysis/transition-state problem, that is beyond my zone of competence (no cheeky comments!). It's a really interesting little research problem related to 'click' chemistry - and is probably about a weeks work to someone who knows what they are doing. It's a problem involving conformational analysis for some ~12 heavy atom (CHNO only, no exotics) systems, and then some transition state analysis. To me, it's the sort of thing that needs some MM conformational analysis, followed by MO calculations, and then some thinking through the stability of ground state and transition state forms.

Sorry I can't give explicit details here, but it is a cool idea, and if we can do something good, it will be a good publication (which of course, you would be a co-author on :) ).

So, if you're interested, get in touch....