Skip to main content


Showing posts from 2011

Omics and Personalised Healthcare - February 2012

We're speaking at the forthcoming EMBL Conference on Omics and Personalised Healthcare  in Heidelberg, held 16th to 18th February 2012. We'll present some of our recent work on pharmacogenetic variation, and some strategies on both data-mining and some of the implications for drug discovery.

Three ChEMBL talks at the March ACS in San Diego

A number of the ChEMBL team will be at the ACS in San Diego next March - we'll be giving a couple of talks -  one is one drug safety (jpo) one is on peptide SAR analysis and design (Patricia Bento) and the final one is on UniChem (Jon Chambers). So hopefully see you there! We'll be spending some time in the lab of one of our close collaborators - Mike Gilson at UCSD , but we're happy to meet up with others, give talks, training, etc . as time and travel schedules permit. So if you're interested in this, get in touch. Update: Fixed post, to reflect three accepted talks. PAPER ID: 20839 PAPER TITLE: "Drug combinations to reduce adverse drug reactions and improve intrapatient differences in response" DIVISION: CINF: Division of Chemical Information SESSION: Systems Chemical Biology and Other PAPER ID: 22715 PAPER TITLE: "UniChem: A prototype unified chemical structure cross-referencing and identifier tracking system" DIVISION: CINF: Divi

Kinase SARfari ver.5.0 Released

We would like to announce the release of new version Kinase SARfari. The latest version has been updated using the chembl_12 data, which contains: Bioactivity datapoints: 503041 (+15%) Compounds: 54189 (+6%)   Various dumps of the data from Kinase and GPCR SARfari are also downloadable from the download page.  Of course, we welcome feedback from our users!

Spotfire DecisionSite replacement

We used to use Spotfire DecisionSite for data visualisation, and liked it a lot - we've just found out there are some pretty major changes to the licensing; and are now looking for alternatives if we can't work out a way forward. So, any suggestions for good data exploration tools, preferably with some sort of chemistry capabilities. Ideally they would run natively on Mac OsX. Ideas in the comments please.....

Christmas is coming, the goose is getting fat, must be the time to register for TACBAC!

Rough Breakdown of Drug Classes for 2011

Here's the equivalent view of drug approvals for 2011 so far - pretty similar picture in terms of the distribution of molecule classes - but remember this set of drug approvals, got their USANs assigned, on average about 3 to 4 years ago, so the sets are not strictly comparable, and of course, suffer from small absolute numbers....

Happy ChEMBLmas!!

As part of our tradition of making Xmas seasonal eCards to all our users - above is the December 2012 card. For the geeky amongst you, there may be an Easter Egg in the image file....

Conference: Advances in Protein-Protein Interaction Analysis and Modulation

Registration is now open for a really interesting EMBO Workshop on the modulation of protein-protein interactions (ppi's). We're speaking, and will present some new stuff on annotation of ChEMBL with interacting domains, and a sort of classification of these into protein-protein interfaces (see some previous blog posts for some further details on this). The meeting is in the beautiful French harbour town of Roscoff - the traditional home of Onion Johnnies - well, I love onions, and my name is John, and I'll be there!

Amgen Scholars Program

Due to a lot of wasted time in the past, my lab does not host or support applications to the Amgen Scholars Program, so please do not apply to us!

Papers: MIRIAM and

The NAR Database Issue is currently in full flow, and there are many excellent articles; one important one for ChEMBL is this paper from the group of our good friend Nicolas Le Novere , at the EBI. It addresses a really important problem in biological and chemical data integration through the generation of unique and stable identifiers for records in a data collection – these are MIRIAM identifiers (MIRIAM is an acronym for Minimum Information Required in the Annotation of Models Registry ( ). is a new service ( ) that is built upon the information stored in the MIRIAM Registry and which provides directly resolvable identifiers, in the form of Uniform Resource Locators (URLs). Resources such as this are essential components for ad hoc, distributed queries across disparate data sources, and a core component for semantic web development.  A link to the free pdf of the paper is here . ChEMBL is in,

MIPTEC 2012 and EMBO Chemical Biology 2012

A quick post to say that two of the best conferences in Europe next year are coordinating speakers and schedules for the Drug Discovery sessions to allow a really great line-up of talks and speakers, and to allow the best use of your (always too small!) travel budgets for conference attendance in these economically tough times. More later, but pencil both conferences into your diary now! MIPTEC 2012  - September 24th to 27th 2012, Basel, Switzerland EMBO Chemical Biology - September 26 to 29th 2012, Heidelberg, Germany.

Cape Town

I've just spent a great week in Cape Town , at UCT , visiting the lab of Kelly Chibale ; where there's lots of activity in academic drug discovery, and also at the Institute of Infectious Disease and Molecular Medicine . My first time in Africa, and it won't be my last!

3,788 Thank yous!

So a big and heart-felt three thousand, seven hundred and eighty eight 'Thank Yous' to all the benevolent donors to the EBI Movember Team - The Bioinformoustachians . It was a lot of fun, the place is a lot hairier now, and there are more smiles on faces than at the start of the month. Mission accomplished by our International (hair)peace keeping force - The Victoria and George Crosses (respectively) are awarded to Francesco Iorio (sgt. 8th Italian light foot) and Remco Loos (cpl. 4th Dutch commando) for their epic struggle and hand-to-hand combat for first place in the fund-raising league table, and a ' mention in despatches ' is presented to Nicolas Le Novére (lt. 1st French prancers) for being the ' Top Gun ' amongst the faculty. The platoon is now returning home, and will soon be changing back into ' civvies ' (much to the relief of their partners and family). If any other large science data centres want to make a fight of it for Movember 201

ChEMBL 12 Released

We are pleased to announce the release of ChEMBL_12. This latest version of the ChEMBL database contains: 1,222,969 compound records 1,077,189 distinct compounds 596,122 assays 5,654,847 bioactivities 8,703 targets 43,418 documents 7 data sources This release includes updates to the manually extracted Medicinal Chemistry literature, a number of published ADMET datasets (for metabolic enzymes and various transporters), updates to OrangeBook drug approvals and a partial update from PubChem BioAssay . ChEMBL_12 also contains a new deposited dataset of Malaria liver-stage screening data from Novartis-GNF (for more details on this dataset, please see ChEMBL-NTD website and the recent publication in Science ). Please refer to the ChEMBL_12 release notes for a more detailed description of all changes included in this release. You can download the data from the ChEMBL ftpsite:


UniChem - An EBI compound structure cross-referencing resource

We have faced for some time some issues with compound integration with ChEMBL - specifically the loading of compound sets into ChEMBL for cross referencing, between for example, ChEBI, PDBe compounds, etc . The ChEMBL update cycle is relatively slow with respect to some other resources, and there is inevitable thrash with compounds not being present, especially for exciting new data. Without doing something different for compound integration, we were starting to face a scenario where we had a compound table with many millions of compounds without any bioactivity data, and following this the inevitable slowdown in searching, etc . We also had some issues facing us about curation of other people's primary data, changing compound structures, or their rendering, etc . So, we decided to set up an external system to resolve cross-references between various databases. This is a very simple Standard InChI lookup, containing compounds from resources such as ChEMBL, ChEBI, PDBe, Dru

ChEMBL Widgets Update

We have made a couple of minor updates to the ChEMBL widgets, which include: A new widget has been created, which displays the bioactivity results shared between a ChEMBL compound and a ChEMBL target A new scaling parameter allows you to vary the size of the widget A more informative message is provided when widget has no data to display More details can be found here

Further Depositions to ChEMBL-NTD

We're delighted to announce the availability of three distinct new datasets on the ChEMBL-NTD portal , available for download, reuse, etc. These are: Novartis-GNF Malaria Liver Stage dataset (associated with this Science publication) ( Plasmodium falciparum ). DNDi Human African Trypanosomiasis (HAT) dataset ( Trypanosoma brucei ) DNDi Chagas Dataset ( Trypanosoma cruzi ). Further details of the assays and compounds are to be found on the ChEMBL-NTD portal . The data will be integrated and loaded into a future version of ChEMBL, as well as the direct data download links. Once more, we thanks the depositors, DNDi and Novartis-GNF , for their benevolence and commitment to Open Science. The associated publication for the Novartis-GNF dataset is: %T Imaging of Plasmodium Liver Stages to Drive Next-Generation Antimalarial Drug Discovery %A S. Meister %A D.M. Plouffe %A K.L. Kuhen %A G.M.C. Bonamy %A T. Wu %A S.W. Barnes %A S.E. Bopp %A R. Borboa %A A.T

Interest in Links to Patents From Structures in ChEMBL

We are exploring establishing links from the ChEMBL compounds to patents. The implementation can have two basic routes.... Links from the interface to patents (simple and quick to do now we have UniChem). Patent uri's in the database itself (more complex, and more difficult to keep up to date, but arguably more useful). So to help our planning for next year, comments, wishes are most welcome....

New Drug Approvals 2011 - Pt. XXXI Asparaginase Erwinia chrysanthemi (ErwinazeTM)

ATC code : L01XX02 On November 18, the FDA approved asparaginase from  Erwinia chrysanthemi for the treatment of patient with acute lymphoblastic leukemia (ALL) who have become allergic to the E. coli asparaginase that is conventionally used for the treatment of ALL patients. ALL is a cancer of the white blood cells and can be fatal within weeks from the onset of the disease if it is left untreated. In ALL, there is an unproportional increase in the population of immature white blood cells, which crowd out functional immune cells as well as red blood cells and platelets, and in advanced stages of the disease infiltrate into tissues and organs, most frequently liver, spleen and lymph nodes. The symptoms of ALL in its initial stages are fatigue, anemia, frequent infections and fever as well as breathlessness and prolonged bleeding. ALL is caused by DNA damage and associated with exposure to radiation and cancerogenic chemicals. There are a number of typical chromosomal tr

New Drug Approvals 2011 - Pt. XXX - Aflibercept (EyleaTM)

ATC code (partial): S01LA On November 18th 2011, the FDA approved Aflibercept (trade name: Eylea ; Research Code: AVE-0005,  also known as VEGF Trap), a recombinant fusion protein indicated for the treatment of patients with neovascular (wet) age-related macular degeneration (AMD) . AMD is an eye condition which usually occurs in older patients and affects the macula area of the retina, causing loss of vision and eventually blindness. In particular, wet AMD is characterised by an abnormal growth of new blood vessels ( neovascularisation ) behind the retina. This originates from an abnormal activation of angiogenesis, by the vascular endothelial growth factor-A (VEGF-A; ChEMBL: CHEMBL1783 ; Uniprot: P15692 ) and the placenta growth factor (PlGF; ChEMBL: CHEMBL1697671 ; Uniprot: P49763 ), of the vascular endothelial growth factor receptors VEGFR-1 (ChEMBL: CHEMBL1868 ; Uniprot: P17948 ) and VEGFR-2 (ChEMBL: CHEMBL279 ; Uniprot: P35968 ), two receptor tyrosine k

New Drug Approvals 2011 - Pt. XXIX (ruxolitinib phosphate) (Jakafi ™)

ATC Code: L01XE18 Wikipedia: Ruxolitinib On November 16th 2011, the FDA approved ruxolitinib phosphate (Tradename: Jakafi ™ Research Code: INCB-018424), a JAK1/JAK2 inhibitor for the treatment of patients with intermediate or high-risk myelofibrosis , including primary myelofibrosis, post-polycythemia vera myelofibrosis and post-essential thrombocythemia myelofibrosis. Myelofibrosis is a disorder of the bone marrow, in which the marrow is replaced by scar (fibrous) tissue. Scarring of the bone marrow reduces its ability to blood cells, and can lead to anemia, bleeding problems, and a higher risk of infections due to reduced white blood cells. It is also associated with engorgement of organs suchs as the spleen and liver. Primary myelofibrosis may develop to secondary myelofibrosis - including leukemia and lymphoma. Myelofibrosis is associated with dysregulated Janus kinases JAK1 and JAK2, and some with a somatic mutation in JAK2 (JAK2V617F) ( OMIM ). JAK signaling in


A quick reminder of the TACBAC 2012 conference . Previous conferences in the series have been excellent, and so check out the website for some initial conference details.

Molecular Architecture of the Human ADMET System

Here is an interesting graph, it the the frequency distribution of the functional PFAM domains for the human ADMET system - more specifically, it is the distribution of domain frequencies for the single domain containing proteins (the multidomain set is being curated now). The source data comes from the PharmADME site (the graph includes the "extended set"). So just 10 distinct functional domains cover almost 75% of the domains (there are a total of 46 domains in this set). By far the most frequent domain is, unsurprisingly, the cytochrome p450 domain ( PF00067 ).

Deadline Approaching for Current Recruitment in ChEMBL

The deadline is approaching for two posts in the ChEMBL group - one a web developer, and the other a data integration position. The posts are three year fixed term contracts. Closing dates for applications is the 27th November 2011 . Further details should be available here Web Developer  ( EBI_00145) Data Integration  ( EBI_00144) If you have any questions, please feel free to  contact us .

Recommendations for a MySQL Chemical Data Cartridge?

What options are there for a MySQL Chemical Structure Cartridge ? - the constraints are that the license needs to be Open (to commercial and non-commercial users). Post away in the comments, then everyone can see the answers. Update : for a little background on our specific interests - we wish to build a deployable and distributable version (a package or vm) with a preconfigured and loaded current ChEMBL database, capable of performing full chemical search capability. Deployment could be as a Linux style package, or as an Amazon EC2 instance. Our internal systems here at the EBI are based on Oracle, and the MDL (or whatever the current name is :) ) Direct cartridge - this configuration is sometimes beyond the reach of many budgets, and so we are interested in exploring a 'free' but useful version of ChEMBL. Update 2 : So Postgres opens up quite a few more options....

Movember Donations from Outside the United Kingdom In responses to a question from one of the ChEMBL-og readers - I've just checked, and it is possible for non-UK residents to donate to the EBI Movember team - The Bioinformoustachians . Link for donations is above.

The Start of Movember - Clean Shaved and Ready To Grow!

Sorry if these posts are a little off normal topic. But the majority of the member of the EBI's team for Movember assembled yesterday for a 'before' photo. Just look at all those baby faces, look at all those chins!

Meeting: Drug Discovery Oxford 2012 - Drug discovery: a job too complex for academia or industry alone?

There is a great meeting being held in Oxford just after the Christmas break (specifically on the 5th and 6th January 2012), it's organised by the Structural Genomics Consortium , one of the longest existing advocates of Open Science in Drug Discovery. Details of the meeting are here .

Meeting: 7th German Conference on Chemoinformatics

A meeting that may be of interest to many of you - the GCC 2011 7th German Conference on Chemoinformatics , to be held from November 6th to 8th 2011 in Goslar , Germany.

The Bioinformoustachians!

The EBI has a team entered for the annual Movember fund raising event - this is focussed on raising money for 'male cancers' - testicular and prostate . The team is called  The Bioinformoustachians!, so please consider sponsoring us over the coming month. Of course, as well as raising money for a serious cause, there will be some fun along the way as well. Watch over the next few days, as the full team signs up - we're gonna raise Loadsamoney (hopefully). For those unfamiliar with the idea - participants are clean shaved at the start and grow a moustache throughout the month gaining sponsorship for looking stylish/silly. The team webpage on the movember website is here . Please consider donating, and help us, together, make a difference.

New Drug Approvals 2011 - Pt. XXVIII Clobazam (OnfiTM)

ATC Code: N05BA09 Wikipedia: Clobazam On October 24 th , the FDA approved Clobazam (Tradename: Onfi TM ; Research Code: RU-4723), a GABA A receptor agonist, for the adjunctive treatment of seizures associated with Lennox-Gastaut syndrome ( LGS ) in patients aged two years or older. Lenox-Gastaut syndrome is a rare and severe form of epilepsy that is typically diagnosed in childhood and often persists into adulthood. LGS accounts for 1-4% of childhood epilepsies, and it is associated with multiple types of seizures, as well as, daily periods of frequent seizures. Clobazam decreases the frequency of the LGS seizures by potentiating GABAergic neurotransmission, trough the binding of the GABA A receptor at the benzodiazepine site. GABA A receptor is a protein complex of five subunits (mainly α2β2γ) located in the synapses of neurons. All GABA A receptors contain an ion channel that conducts chloride ions across neuronal cell membranes and two binding sites fo

New Drug Approvals 2011 - Pt. XXVII Deferiprone (FerriproxTM)

  ATC code  V03AC02   Wikipedia Deferiprone On October 14th, 2011 FDA announced the approval of Deferiprone ( trade name : Ferriprox TM ) for the treatment of iron overload which is potentially fatal in patients with thalassemia . Deferiprone is an oral iron chelating agent, binding excess iron in the blood and thus making it available to excretion from the body. Thalassaemia is a inherited (mostly autosomal recessive) blood disease that can lead to anemia by causing the formation of abnormal hemoglobin molecules not able to properly bind and release oxygen. Thalassaemia (OMIM: 141800 (α-) / 141900 (β-)) is sub-classified according to which of the subunits of the hetero-tetrameric (2α/2β, UniProt: P69905 / P68871 ) hemoglobin is affected, contrary to sickle-cell anaemia (OMIM: 603903 ) which results exclusively from a specific mutation in the β subunit. The primary treatment of thalassaemia major, the severe form of β-thalassaemia, requires frequent blood transfusions

Recruitment - Two positions in ChEMBL team now available

We have two posts available in the ChEMBL group - one a web developer, and the other a data integration post. The positions are both for three years and will be EMBL staff contracts. Closing dates for applications is the 27th November 2011 . Further details should be available here (the links are quite fragile I'm afraid, so sorry if they do not keep working for long) Web Developer  ( EBI_00145) Data Integration  ( EBI_00144) If you have any questions, please feel free to  contact us .

PhD studentship at the Institute of Cancer Research

From the lab of one of our collaborators comes the following...... Details of forthcoming PhD studentships at The Institute of Cancer Research  ICR are now on-line ; There are 12 studentships across a range of different disciplines including Biology, Chemistry, Informatics and Medical Physics. The deadline for applications is 1st December 2011 . There is a specific studentship of likely interest to ChEMBL-og readers -  Identifying novel targets and target combinations for cancer using in-silico chemical biology , within the  Computational Biology and Chemogenomics Team of the ICR. This is a computational biology/lab biology PhD jointly between Dr. Bissan Al-Lazikani and Prof. Paul Workman. The project is an exciting multi-disciplnary project that will utilise bioinformatics and chemogenomics techniques, protein interaction network modelling as well as laboratory biology to identify novel drug intervention targets (and compounds) for use in combination therapies with best-in

Want to help shape the future of ChEMBL?

We are planning to hold two half-day (fun) workshops in mid February next year ( i.e. the week of February 13th 2012). Aimed at medicinal chemists and molecular modellers - the idea is to develop easy to use workflows for several key tasks that drug discoverers often want to do, and could do more efficiently with the ChEMBL data. The workshops will be on campus here at Hinxton, and will start around 10.30am and finish around 3pm (lunch, coffee and cakes will be provided); the focus will be on.... Day 1 - Use of ChEMBL in lead optimisation Day 2 - Use of ChEMBL for library design/compound purchase If you are interested in helping, please mail us , and tell us what session you'd most like to attend. Space will be limited to around 8 attendees.

Why Movember is Important

'Excluding skin cancer, prostate cancer is the most commonly diagnosed cancer among men in the US and the second most common cause of cancer death among men. It is estimated that about 1 in 6 men in the US will be diagnosed with prostate cancer during their lifetime and 1 in 36 will die from this disease.' Quote from Cancer Facts & Figures 2010. American Cancer Society