ChEMBL Resources


Monday, 17 December 2012

Pipeline Pilot Cambridgeshire UGM

We will be organising the 2nd Cambridgeshire Pipeline Pilot Users Group meeting on Thursday 17th January 2013, at 3pm here at the ChEMBL HQs. This is provided that the Mayans were actually wrong. 

This is a preliminary agenda for the meeting:

1. Welcome and Host talk:  George Papadatos + Gerard van Westen:
      Cool things with Pipeline Pilot and ChEMBL
2. Peter Woollard (GSK):
      Using Pipeline Pilot for computational biology capabilities, where it has helps the most and where it is less used
3. Richard Carter (ONT):
       Pipeline Pilot on a memory stick
4. Mike Cherry (Accelrys):
        Repetitive Data Flow
5. Question and Answer session, including:
   - how people have found Next Generation Sequencing components  and the Text Analytics components
   - using Pipeline Pilot for running command line software on remote linux servers and retrieving results
6. Adrian Stevens (Accelrys)
      Upcoming chemistry components in PP9.0

If you fancy attending, drop me a line.


Friday, 14 December 2012

Paper: Mapping small molecule binding data to structural domains

Our interacting domains paper is out in pdf form. Here's the link.

%T Mapping small molecule binding data to structural domains
%A F.A. Kruger
%A R. Rostom
%A J.P. Overington
%J BMC Bioinformatics 
%D 2012
%V 13(Suppl 17)
%P S11 
%O doi:10.1186/1471-2105-13-S17-S11

Wednesday, 12 December 2012

Paper: Automated design of ligands to polypharmacological profiles

Another great paper in Nature this week, making extensive use of ChEMBL. It's by our long-term collaborators up at Dundee - Jeremy, Richard and Andrew - well done, great stuff! Basically it combines a knowledge-base of SAR data (ChEMBL), some predictive models for affinity/properties, and extracts a set of reasonable transforms (chemical conversions) from the same knowledge-base. I'll ask Jeremy/Andrew to do a guest post on the ChEMBL-og on the paper - they're probably pretty busy with press-releases, etc. ;)

Here's a link to the paper.

Have a read, it will keep you busy for a few hours.

%A J. Besnard
%A G.F. Ruda
%A V. Setola
%A K. Abecassis
%A R.M. Rodriguez
%A X.-P. Huang
%A S. Norval
%A M.F. Sassano
%A A.I. Shin
%A L.A. Webster
%A F.R.C. Simeons
%A L. Stojanovski
%A A. Prat
%A N.G. Seidah
%A D.B. Constam
%A G.R. Bickerton
%A K.D. Read
%A W.C Wetsel
%A I.H. Gilbert
%A B.L. Roth
%A A.L. Hopkins
%T Automated design of ligands to polypharmacological profiles
%J Nature
%D 2012
%V 492
%P 215-220

What will you do today with ChEMBL?

Tuesday, 11 December 2012

New Drug Approvals 2012 - Pt. XXVII - Choline C-11

On September 12, FDA approved Choline C-11, an intravenous radioactive diagnostic agent to be used as tracer during Positron Emission Tomography (PET) scan to help detect sites of recurrent Prostate Cancer (OMIM : 176807 ; MeSH : D011471) .

Prostate cancer is the most common cause of death from cancer in men over age 75, and is rarely found in men younger that 40. Unlike many other cancers, prostate cancer usually progresses very slowly. Sometimes the cancer cells may metastasize from the prostate to other parts of body. Overall, it is estimated to be the sixth leading cause of cancer-related death in men.

Choline is a naturally occurring component of the numerous Vitamin-B complex, and is necessary for normal cell structure and signalling. Choline C-11 is a radiolabeled synthetic analog of choline that releases a positron by beta decay which can be visualised by PET. Choline is rapidly taken up by the prostate cells and this allows the prostate to be imaged. 

Choline, a precursor molecule essential for the biosynthesis of phospholipids which are the structural components of cell membranes, as well as modulation of trans-membrane signalling. Increased activity of phospholipid synthesis has been associated with increased cell proliferation and the transformation process that occurs in tumour cells.
Choline C-11 is a positron emitting radiopharmaceutical that is used for diagnostic purpose in conjunction with PET imaging. The active ingredient is Choline C-11 and each millilitre of the injection contains 148 MBq to 1225 MBq of the active ingredient.

IUPAC Name (Choline) : 2-hydroxy-N,N,N-trimethylethanaminium
Canonical Smiles : [Cl-].[11CH3][N+](C)(C)CCO
Standard InChI : 1S/C5H14NO.ClH/c1-6(2,3)4-5-7;/h7H,4-5H2,1-3H3;1H/q+1;/p-1/i1-1;

Following intravenous administration, Choline C-11 distributes mainly to the pancreas, kidney, liver, spleen and colon. The radioactivity accumulated rapidly within the prostate and peak uptake appeared with in 5 mins following the administration. Choline C-11 undergoes metabolism resulting in the detection of 11C-betaine as the major metabolite in blood. The rate of excretion of Choline C-11 in urine was 0.014 mL/min.

Choline C-11 has been developed and marketed by Mayo Clinic.

Full prescribing information is found here.

Browsers and Bugs

We had a support email recently that some things on the interface didn't work with chrome (an export function) - we couldn't repeat the issue with the equipment we have here at ChEMBL Towers. But there are a lot of OS's and a lot of browsers out there, and we can't recreate every possible environment - interestingly, chrome is really popular amongst you people (the image above is a google analytics report of a weeks access of this very blog). I'm a safari man myself....

So as a reminder, we love hearing about bugs and issues, we really do, so send them to!

Thursday, 29 November 2012

ChEMBL Cross Reference Links Now In UniProt

So, some great news for those of you that use UniProt - there are now links to the corresponding target pages in ChEMBL in there.

Here's the link ( to the list of ChEMBL targets that are in Uniprot. And there are links to ChEMBL in the Cross References section.


Saturday, 24 November 2012

A 101 Thankyou's!

This week, our ChEMBL NAR Database paper made the milestone of over a hundred citations (in less than a year too). This made us all very, very happy, and for a few moments, we rested our fingers from our keyboards, and used our them instead to grasp a mug of coffee/tea; but only for a few seconds, before we got back to mixing and baking and cooking ChEMBL 15 for you all.

Here's a list to the current citations of Gaulton et al., NAR Database, 40, D1100-D1107, 2012. Remember this is an Open Access paper.

Please keep, keeping us happy by using our work, it's probably the biggest satisfaction we can get :)

Friday, 23 November 2012


As part of the ChEMBL groups involvement in the OpenPhacts project, a representative from the ChEMBL team will be attending SWAT4LS next week. As well as hacking and learning about new Semantic technologies there may be time to catch up with ChEMBL users also attending the workshop. So if you would like to hear about what we are doing with the Semantic Web, RDF or just have a general chat about ChEMBL, please get in touch.

Sunday, 18 November 2012

New Drug Approvals 2012 - Pt. XXV - Tofacitinib citrate (XELJANZ®)

On November 6, the FDA approved Tofacitinib citrate (Trade Name: XELJANZ®; Research code: CP-690550, ChEMBL : CHEMBL221959, PubChem: CID9926791, DrugBank: DB08183, ChemSpider: 8102425) to treat moderately to severely active Rheumatoid Arthritis (RA). It is orally administered and may be used as monotherapy agent or in combination of non-biologic DMARDs. About 1% of the world-wide population is affected by rheumatoid arthritis. RA affects predominantly women (three times more susceptible than men) and is more frequent between ages 40 and 50, but people of any age can be affected.

Other approved drugs in this commercially competitive sector include Adalimumab (Trade Name: Humira, ChEMBL: CHEMBL1201580, DrugBank: DB00051), Etanercept (Trade Name: EnbrelChEMBL: CHEMBL1201572, DrugBank: DB00005), Infliximab (Trade Name: Remicade, ChEMBL: CHEMBL1201581, DrugBank: DB00065).

IUPAC Name: 3-(4-methyl-3-(methyl(7H-pyrrolo[2,3-d]pyrimidin-4-yl)amino)piperidin-1-yl)-3-oxopropanenitrile
Canionical Smiles: C[C@@H]1CCN(C[C@@H]1N(C)c2ncnc3[nH]ccc23)C(=O)CC#N
Standard InChi : InChI=1S/C16H20N6O/c1-11-5-8-22(14(23)3-6-17)9-13(11)21(2)16-12-4-7-18-15(12)19-10-20-16/h4,7,10-11,13H,3,5,8-9H2,1-2H3,(H,18,19,20)/t11-,13+/m1/s1

XELJANZ is the citrate salt of tofacitinib which is an inhibitor of Janus Kinase (JAK), an intracellular tyrosine kinase which transmit signals from cytokine or growth factor-receptor interactions on the cell membrane to influence cellular processes of hematopoiesis and immune cell function. Within the signalling pathway JAKs phosphorylate and activate Signal Transducers and Activators of Transcription (STATs) which modulate intracellular activity including gene expression. JAK-STAT system is a major signalling alternative to the secondary messenger system.

XELJANZ is specifically designed to inhibit the JAK pathways, which are signalling pathways inside the cell that play an important role in the inflammation involved in RA. Tofacitinib modulates the signalling pathway and prevents the phosphorylation and activation of STATs. JAK enzymes transmit cytokine signalling through pairing of JAKs (e.g., JAK1/JAK2, JAK1/JAK3, JAK1/TyK2, JAK2/JAK2). Tofacitinib has in vitro activities against JAK1/JAK2, JAK1/JAK3 and JAK2/JAK2 combinations.

The picture on left is PDB entry: 3lxn for crystal structure for TYK2 (in Red) with CP-690550 (in Blue/Green) and the picture on right is PDBe entry: 3lxk for crystal structure for JAK3 (in Red) with CP-690550 (in Blue/Green).

Recommended dosage is 5 mg orally. It has an apparent volume of distribution of 87 L and protein binding to the drug is approximately 40%, and it's bioavailability is 74% with elimination half-life of approximately 3 hrs. Metabolism of Tofacitinb is majorly mediated by Cytochrome P450 3A4 (CYP3A4) and minor contribution from Cytochrome P450 2C19 (CYP2C19). Clearance is estimated to be 70% hepatic, 30% renal.

Full prescribing information can be found here.

License holder is Pfizer and product website is

Wednesday, 14 November 2012


The testing and QC pixies have been really busy with Open ChEMBL - the OSDODS virtual machine appliance - this has made us aware of the support questions we're likely to get, and so we'd like to build our knowledge-base of support issues with a small pilot release, prior to facing a lot of queries about VirtualBox, ifconfig etc.

If you are interested in getting early access to Open ChEMBL, and have experience in configuration of vm's in heterogeneous environments - please get in touch.

Tuesday, 13 November 2012

DNDi screens MMV’s open access Malaria Box

The Drugs for Neglected Diseases initiative (DNDi) and Medicines for Malaria Venture (MMV) announce today the identification of three chemical series targeting the treatment of deadly neglected tropical diseases (NTDs), through DNDi’s screening of MMV’s open access Malaria Box. The resulting DNDi screening data are among the first data generated on the Malaria Box to be released into the public domain, exemplifying the potential of openly sharing drug development data for neglected patients.

The open access Malaria Box is an MMV initiative launched in December 2011 to catalyse drug discovery for malaria and neglected diseases. It contains 400 molecules, selected by experienced medicinal chemists to offer the broadest chemical diversity possible and is available free of charge. In return, MMV requests that any data gleaned from research on the Malaria Box are shared in the public domain within two years. To date, more than 100 Malaria Boxes have been delivered to over 20 countries for research on diseases including malaria, neglected diseases, HIV and cancer.

DNDi, in partnership with the Laboratory for Microbiology, Parasitology and Hygiene (LMPH), University of Antwerp, screened all the compounds in the Malaria Box against the parasites responsible for the three NTDs on which DNDi mainly focuses: sleeping sickness (human African trypanosomiasis), leishmaniasis (including visceral leishmaniasis, or kala azar, also known as black fever), and Chagas disease. This initial screen identified two potential drug series for the treatment of sleeping sickness and one for leishmaniasis. The DNDi screens have yielded valuable information that will strengthen DNDi’s research pipeline. All the biological data from DNDi’s screen, together with the existing preliminary data from MMV, are now publicly available on the open-source ChEMBL database.

SMS-DrugNet Allosteric Regulators Workshop, Edinburgh, December 2012.

For many classically 'undruggable' targets, there is sometimes the prospect of the discovery and optimisation of allosteric regulators, these can offer advantages in more selective target regulation, or improve the drug-like properties of compounds that bind to the allosteric site. However, allosteric regulators are often discovered via serendipity, and many screens are not configured optimally to identify allosteric regulators.

As part of the grant we are involved in, there is an Allostery Workshop taking place at the University of Edinburgh on 4th December 2012. The Workshop, sponsored by the British Council, involves an extensive delegation of scientists from Turkey led by Burak Erman, Koc University, Istanbul and will bring together a diverse area of disciplines including Biology, Chemistry, Computer Science, Informatics, Mathematics and Medicine. The program for the day will include presentations and poster session.

Gerard Van Westen from the group here will be presenting some of our recent work on annotation and classification of allosteric/orthosteric regulators in ChEMBL. The title of his talk is "Beating (the) Competition in Lead Discovery: Property and Structural trends in Allosteric Regulators"

Further details are available on this link.

Update: Space is limited, so if you are interested in going, register early!

Friday, 9 November 2012

ChEMBL Virtual Machine

Next week we will be releasing ChEMBL Virtual Machine. We have referred to it in a previous post and had hoped to make it available this week, but as always with best laid plans....

So, we are using this post to generate some pre-release excitement :) and also to acknowledge the hard work of Rodrigo Ochoa who worked on this project during his 5 month internship with the ChEMBL group.

We will be providing a lot more detail in next weeks blog post, but as a quick summary the VM is based on a Ubuntu linux build and comes preloaded with ChEMBL_14 (in PostgreSQL), RDKit and a web application, which makes use of Marvin and allows users to easily get started with querying the ChEMBL data.

New Drug Approvals 2012 - Pt. XXIV - Ocriplasmin (JetreaTM)

On October 17, the FDA approved Ocriplasmin (tradename: Jetrea; Research Code: Microplasmin), a proteolytic enzyme indicated for the treatment of symptomatic vitreomacular adhesion (VMA). VMA is a condition of the eye that results from the liquefaction of the vitreous gel within the human eye and consequent adhesion to the retina. As the eye ages, the vitreous humor can naturally separate from the retina. However, if the separation is not complete, areas of adhesion can occur. The traction from these adhesion areas on the retinal surface is the underlying pathology of symptomatic VMA, which can lead to ocular damage. Ocriplasmin is the first drug approved to treat this condition and it exherts its therapeutic action by selectively breaking down the three major protein components, fibronectin, laminin and collagen, of the vitreous body and vitreoretinal interface, and thereby dissolving the protein matrix responsible for VMA. The only alternative treatment is a surgical procedure to remove the vitreous from the eye, known as vitrectomy.

Ocriplasmin is a recombinant truncated form of the human plasmin (ChEMBL: CHEMBL1801; Uniprot: P00747) with a molecular weight of 27.2 kDa produced by recombinat DNA technology in a Pichia pastoris expression system.

>PLMN_HUMAN Plasminogen (Ocriplasmin is displayed in bold)

The PDBe entry (PDBe:1bui) for the crystal structure of Ocriplasmin (in green) in complex with Staphylokinase (in red) is displayed below.

The recommended dosage of Ocriplasmin is 0.125 mg (0.1 mL of the diluted solution) administrated by intravitreal injection to the affected eye once as a single dose.

The license holder is ThromboGenics, Inc. and the full prescribing information of Linaclotide can be found here.

Sunday, 4 November 2012

New Drug Approvals 2012 - Pt. XXIII - Omacetaxine mepesuccinate (SYNRIBOTM)

ATC code: L01XX40
Wikipedia: Omacetaxine_mepesuccinate

On October 22nd 2012 the FDA approved omacetaxine mepesuccinate (research code: CGX-635, trivial name: Homoharringtonine, trademark: SynriboTM) for the treatment of chronic or accelerated phase chronic myeloid leukaemia (CML) in adults with resistance to two or more tyrosine kinase inhibitors. Omacetaxine is an old drug identified 35 years ago and known to have activity in CML, but its clinical development was previously halted due to the discovery of BCL-ABL and other targeted kinase inhibitors Pubmed: 21294709. The rapid development of tyrosine kinase inhibitor resistant tumors has led to the need for agents that can act in these treatment-derived drug-resistant patients. Omacetaxine mepesuccinate has been approved based on observed major cytogenetic response rather than on improvement in disease-related symptoms or increased survival.

Omacetaxine mepesuccinate/homoharringtonine is a cephalotaxine ester of prepared by a semi-synthetic process from cephalotaxine, an extract from the leaves of Cephalotaxus sp found in Southern China. Its molecular formula is C29H39NO9, with an IUPAC name of 4-methyl (2R)-hydroxyl-2-(4-hydroxyl-4-methylpentyl) butanedioate, its molecular weight is 545.6 Da. It is absorbed following subcutaneous administration, and maximum plasma (Tmax) concentrations are achieved after ~30 minutes. The volume of distribution (Vd) is variable at 141 +/- 93.4 L following subcutaneous administration of 1.25 mg/m2 twice daily for 11 days.

The mechanistic target of omacetaxine mepesuccinate is not fully known, however its action includes inhibition of protein synthesis. It was found to bind to the A-site cleft in the peptidyl-transferase center of the large ribosomal subunit in archaeabacteria and is understood to prevent the correct positioning of amino acid side chains of incoming aminoacyl-tRNAs, thus inhibiting the elongation step of peptide synthesis. 

It was shown in vitro to reduce the protein levels of a number of oncogenic proteins including BCR-ABL and MCL1. Because it does not bind BRC-ABL directly, it is active both in wild-type and resistant tumours harbouring the ABL drug resistance mutant: T351I as seen in mouse models.

Prescribing information is here

SynriboTM is marketed by IVAX pharmaceuticals

Paper: Mapping small molecule binding data to structural domains

We've just published a paper on mapping the sites of small molecule binding in complex multidomain proteins (pdf here - this link doesn't seem to work at the moment, sorry). The resolution of the mapping is at the level of Pfam domains. We love Pfam, and love it even more that the Pfam team is moving to the EBI this week. The motivation for this work is multifold, and it addresses a pretty big problem in chemogenomics.

Firstly the issue of domain frustration - you search a protein containing a series of distinct domains looking for homologues in ChEMBL. If your protein contains a common and uninteresting domain, something like a zinc finger or EGF domain (our interest is for small molecule binding remember, we're not saying that these domains are completely boring, they're just a lot less interesting from a chemical biology/drug discovery perspective) you'll retrieve a whole bunch of sequence related, but small molecule binding unrelated data. It's just the way bioinformatics works. You can be selective in searching with just the sequence of the domain you are interested in, but this only solves half the problem, since there's no guarantee that compounds retrieved will bind at that domain in the retrieved protein.

Secondly the issue of selective signalling - it is now common to see proteins as existing in pathways, and that there are key control nodes that are highly connected to many other proteins, a key question is therefore how signalling remains specific under this everything-is-linked-to-almost-everything-by-interactions setting. Well we think that the annotation of pathways at the domain interaction level is under-appreciated. For example, consider a protein X that interacts with three other proteins Y, W and Z, and that Y and Z interact through domain 1 in X, and W via domain 2. If you have a small molecule that binds to domain 1 in X, you will selectively affect downstream signalling to Z and Y only.

These data will end up in the next version of ChEMBL - but if you want to get hold of any data prior to this, check out the supplementary data for the paper.

Next steps? Well the approach to mapping and scoring the domains could be improved, and the resolution ideally needs to be at a site level within a domain - so that compounds that bind at different structural sites can be differentiated for model development, pharmacophores, etc. It is an undeniable fact that merging compounds that bind at different sites, with different binding determinants will not be able to predict each other. We have made some progress on this latter sub-problem, and more on that soon.

%T Mapping small molecule binding data to structural domains
%A F.A. Kruger
%A R. Rostom
%A J.P. Overington
%J BMC Bioinformatics 
%D 2012
%V 13
%O doi:10.1186/1471-2105-13-S17-S11

Thursday, 1 November 2012

New Drug Approvals 2012 - Pt. XXII - Perampanel (FycompaTM)

ATC Code: N03AX22
Wikipedia: Perampanel

On October 22nd 2012 the FDA approved Perampanel (research code: E2007, ER-155055-90, trade name Fycompa, CHEMBL1214124). Perampanel is an orally administered drug to be used as an adjunctive therapy for the treatment of partial-onset seizures with or without secondary generalized seizures in patients with epilepsy.

Epileptic seizures are defined as "abnormal excessive or synchronous neuronal activity in the brain". The net symptoms can be very diverse, from severe thrashing movements to a very mild brief loss of awareness. Approximately 4% of the population will have experienced a unprovoked seizure by the age of 80, with a 30-50% chance of repeat in this group. Seizures can last from a few seconds to a state of life threatening persistent seizure (known as status epilepticus).

Approximately 25 % of the people suffering from a seizure or  status epilepticus will be diagnosed to have epilepsy. Treatment may reduce the chance of a second seizure by as much as 50%.

Perampanel acts by non-competitively inhibiting the ionotropic α-amino-3-hydroxy-5-methyl-4- isoxazolepropionic acid (AMPA) glutamate receptor. This receptor consists of a heteromeric combination of 2 out of 4 known subunits GluR-1 - GluR4 (respectively: CHEMBL2009, CHEMBL4016, CHEMBL3595 and CHEMBL3190 or Uniprot 42261, 42262, 42263 and 48058). Of these, the combinations GluR-1/GluR-2 and GluR-2/GluR-3 are the most frequent. The specific mechanism by which Perampanel exerts its antiepileptic effect in humans has not been fully elucidated.

Fycompa is a small molecule drug with a molecular mass of 349.4 g/mol, an AlogP of 3.57 , 3 rotatable bonds and does not violate the rule of 5. Canonical SMILES : O=C1N(C=C(C=C1c2ccccc2C#N)c3ccccn3)c4ccccc4
InChi: InChI=1S/C23H15N3O/c24-15-17-8-4-5-11-20(17)21-14-18(22-12-6-7-13-25-22)16-26(23(21)27)19-9-2-1-3-10-19/h1-14,16H

The recommended starting dose of Perampanel in the absence of other CYP3A4 enzyme-inducing antiepileptic drugs is 2 mg once daily taken orally at bedtime (and can be incremented to the recommended dose range of 8 mg to 12 mg once daily). The recommended starting dose of Perampanel in the presence of CYP3A4 enzyme-inducing antiepileptic drugs is 4 mg and patients should be monitored closely for response.

Perampanel is extensively metabolized via initial oxidation and sequential glucuronidation. Oxidative metabolism is mediated by CYP3A4 and/or CYP3A5 based on results of in vitro studies using recombinant human CYPs and human liver microsomes. Other CYP enzymes may also be involved.

The license holder is Eisai Inc. and the full prescribing information can be found here.

Tuesday, 30 October 2012

Wellcome Trust Courses - Computational Resources For Drug Discovery 2013

Those of you who went on the course we ran this year will know how much fun it was - and from our perspective we're gonna keep on doing it till we get it right! So, once more, there is another chance to attend the course in 2013 - December 9 to 13th 2013 to be precise.

So if you are interested, pencil the dates in your diaries now, and set an automatic alarm for four months before, and check out the full course details then.

Of course, there are lots of other excellent courses in the same series, and the poster is available for download to display on your office wall here.

The First Rule of Security Club is that you do not talk about Security Club

We worry about data security and privacy, a lot. I fret and sweat over this, and it is one of the things (alongside being late with EU reports) that genuinely keeps me awake at night, and that you can never know too much about (again a bit like the EU). We have started to collect examples of security and data privacy issues and vulnerabilities in online chemistry-related resources. Firstly, to build a set of real world examples, and to establish best practice for our own developers. It also allows us to potentially create an environment in which security and privacy matters can be privately discussed without the world being unnecessarily alerted to them; allowing fixes to be made, and generally keep the online chemistry world a better safer place.

As would be expected for this sort of thing, the list will not be open, and not indexed in google (if it is right now, we’ve failed at step one!), so if you’re interested in joining the list, and your job involves the building/maintenance of online chemistry systems with a security/privacy responsibility, get in touch in the normal way.….

Clinical Development Candidate Annotatathon - July 2013

We are thinking of holding an annotatathon for clinical development stage compounds next July, here on campus at the EBI in Hinxton. At this event we will assign/curate efficacy targets for all the clinical stage compounds we have by then identified, simplifying the work by pre-clustering by chemical class/therapeutic area. Data generated during the event will be placed online immediately, and would of course be fully Open (none of this frustrating, online access only for us!).

If there is interest in taking part, and contributing to this effort, let me know! Depending on the level of interest, I may apply for funding to help with travel/accommodation. If you are interested in funding this we'd be delighted to help with this

Monday, 29 October 2012

PubMed² - Experimenting with biomedical literature for tablets and smart phones

We're still playing around with data visualisation, and the experiment of this week focuses on the scientific literature and is designed with tablet devices (such as the iPad or the Nexus 7) and smartphones in mind. The application is a re-thinking of PubMed's search interface and you can get to play with it here at

Let us know in the comments what you think.

Masters Project - Ion-channel structural pharmacology

We have a position in the group in the area of ion-channel structural pharmacology - mapping known ion-channel modulators to sequences and binding sites. This will be in partnership with Pfizer, and the role will involve time spent both at the EBI and at Pfizer's labs in the Cambridge UK area - so a great opportunity to pick up some industrial experience.

If you are interested, please get in touch by December 15th 2012, when we will shortlist candidates for interview.

Sunday, 28 October 2012

Random Notes on Open Drug Discovery/Data Sharing: Part 1

There are some fantastic initiatives in Open Drug Discovery going on at the moment. I for one, are convinced that we are on the cusp of a large structural change in drug discovery, and like at the beginning of all revolutions, the future is not clear, and we all a little bit excited and nervous at the same time. One of the commonly quoted benefits of an Open strategy is that it can avoid duplication, and if you avoid duplication, it means that you get to the goal, faster and cheaper (since other researchers can explore alternative approaches), and there is no repetition. There, you've just read it, and it's quite seductive isn't it?

I've never quite bought this "avoid duplication" argument for the following three reasons. (I should declare my political/philosophical hand here, I have a very deep rooted empathy with the concept of The Free Market. Not the goofy, fudged form that we've had in Western Economies for some time - but that really is a different story, for another time).

1) Scientists are not perfect and they mess things up now and then. The "no duplication" strategy places a lot of weight on the capabilities of a single group, who may not follow the best decision making, have the best approach to data analysis/design etc. There is a lot of discussion at the moment in the literature of the non-repeatability of key pharmacology data, to not have several parallel attempts at a problem seems a little rash given the probably high likelihood of individual failure. If an individual group has a likelihood of 0.6 of getting something done within a given time and given funding. Two groups (with the same likelihood of success/failure) in parallel have a probability of getting it done of 0.84. Simples.

2) Competition is well established to be one of the major drivers of rapid completion in almost all endeavours of life; if you have someone breathing down your neck, potentially scooping you on a paper, you think in a different way, and tend to stay focussed on the task in hand. Given a finite time to complete a piece of work with preplanned and coordinated deliverables, the work miraculously fills the time and funding available.

3) Who will take the decisions over non-duplication? Effectively saying you will not work on this compound series, and another group will, and will people abide with the decisions? We all know that grant committees are useless (unless we are on them of course), and without a lot of process transparency, things could rapidly descend into slow chaos and confusion.

However, I think the arguments for rapid data sharing are very very strong, primarily because they increase liquidity and transparency in the market, and allow market participants to take more rational decisions on the allocation of their resources (individual labs and funders). For me this is the biggest single reason for data sharing (i.e. it actually increases competition, not decreases it). The Free Market of Knowledge in Drug Discovery will drive participants to their best composite roles, based on their abilities.

Saturday, 27 October 2012

Paper: Cheminformatics - Communications of the ACM

Here is a review article on cheminformatics, written as an orientation piece for people from a computational sciences background.

%T Cheminformatics
%A J.K. Wegner
%A A. Sterling
%A R. Guha
%A A. Bender
%A J.-L. Faulon
%A J. Hastings
%A N. O'Boyle
%A J. Overington
%A H. Van Vlijmen
%A E. Willighagen 
%J Communications of the ACM
%V 55
%I 11
%P 65-75
%O DOI:10.1145/2366316.2366334

Thursday, 25 October 2012

ChEMBL - now with added DOIness

In order to provide ChEMBL users with a persistent and citable link to datasets that have been deposited in ChEMBL we have started registering DOIs (Digital Object Identifiers) for these datasets. Many of you will be familiar with the use of DOIs as identifiers for journal articles but they can be used for any document that you want to permanently identify and share with others. By doing this we are providing people with a way of citing a deposited dataset in exactly the same way as you would a scientific publication.

We are also hoping that by issuing DOIs for deposited data we will encourage people to contribute additional data to the ChEMBL database as the DOI will provide them with a permanent way to reference their contribution, for example by using the DOI in a subsequent publication.

At the moment we have DOIs for four of the deposited datasets in the ChEMBL database.  Two are results from screens on the GSK PKIS set and two are datasets measured as part of DNDi but we expect these to increase.  These datasets and their DOIs are shown below.

Compounds: GSK PKIS; Assays: Nanosyn kinase panel
Compounds: GSK PKIS; Assays: UNC Frye lab
Screening and optimization of specific chemical series against human African Trypanosomiasis (HAT)
Optimisation of fenarimol series for the treatment of Chagas disease

The DOIs can be resolved to the ChEMBL Document Report Card from the website

Open data for drug discovery: learning from the biological community

We've just co-authored with a collaborator from GSK an editorial on Open Data available here....

%T Open data for drug discovery: learning from the biological community.
%A A. Hersey
%A S. Senger
%A J.P. Overington
%J Future Medicinal Chemistry 
%D 2012
%I 10
%V 4
%P 1865-1867
%O DOI:10.4155/fmc.12.159

The picture of the Fifty Shades of Grey dog I found on the internet somewhere...

Sunday, 21 October 2012

Interest in a ChEMBL seminar in your lab in the South-East of the US next Spring?

I'm chairing a session at the ACS Spring meeting in New Orleans, in early April 2013 (the 7th thru 11th are the dates of the ACS meeting itself, but I'll probably be finished by the 9th) and am making a visit to Miami and Kentucky on the same trip. I can probably squeeze in two more lab visits if there is interest in a ChEMBL seminar (I would need to leave the US at the latest on Tuesday 16th April). I'll need to look into the practicalities of travel, and realistically they'll need to be in the South-East, but I'm pretty hardy in the air - I don't mind odd timed flights.

So if there is any interest, let me know.

Friday, 19 October 2012

Drug Approval Timeline Visualisation

We're playing around with some visualisation techniques at the moment for ChEMBL, with one of our interests being the display of timelines. Here is a little standalone visualisation of the timeline of FDA drug approvals, annotated with ATC codes, loaded with some toy data (so don't rely on it for structures/publications/analysis!).

Update: So the toolkit we use looks to be quite browser sensitive - we'll look into this, but by default, it looks like it doesn't render in Chrome.....

Tuesday, 9 October 2012

Brain-1.0 - Biomedical knowledge manipulation

The world of data informatics is seeing a cultural change, from a world of databases, such as Chembl, to one where data is more self-descriptive and ad hoc queryable - an evolution into knowledge-bases: The data will be organized around controlled dictionaries and ontologies (Semantic Web), more exposed to programmatic and web service infrastructures, and more robustly linked to other repositories (for a current example of a large nascent network of coordinated data repositories, see the ELIXIR project).

Brain is a library created to achieve such linkage: It can handle and query large biomedical knowledge-bases. The Brain library can also serve as a framework for users interested in Description Logic and Biology.

Website of the library: