ChEMBL Resources


Thursday, 29 November 2012

ChEMBL Cross Reference Links Now In UniProt

So, some great news for those of you that use UniProt - there are now links to the corresponding target pages in ChEMBL in there.

Here's the link ( to the list of ChEMBL targets that are in Uniprot. And there are links to ChEMBL in the Cross References section.


Saturday, 24 November 2012

A 101 Thankyou's!

This week, our ChEMBL NAR Database paper made the milestone of over a hundred citations (in less than a year too). This made us all very, very happy, and for a few moments, we rested our fingers from our keyboards, and used our them instead to grasp a mug of coffee/tea; but only for a few seconds, before we got back to mixing and baking and cooking ChEMBL 15 for you all.

Here's a list to the current citations of Gaulton et al., NAR Database, 40, D1100-D1107, 2012. Remember this is an Open Access paper.

Please keep, keeping us happy by using our work, it's probably the biggest satisfaction we can get :)

Friday, 23 November 2012


As part of the ChEMBL groups involvement in the OpenPhacts project, a representative from the ChEMBL team will be attending SWAT4LS next week. As well as hacking and learning about new Semantic technologies there may be time to catch up with ChEMBL users also attending the workshop. So if you would like to hear about what we are doing with the Semantic Web, RDF or just have a general chat about ChEMBL, please get in touch.

Sunday, 18 November 2012

New Drug Approvals 2012 - Pt. XXV - Tofacitinib citrate (XELJANZ®)

On November 6, the FDA approved Tofacitinib citrate (Trade Name: XELJANZ®; Research code: CP-690550, ChEMBL : CHEMBL221959, PubChem: CID9926791, DrugBank: DB08183, ChemSpider: 8102425) to treat moderately to severely active Rheumatoid Arthritis (RA). It is orally administered and may be used as monotherapy agent or in combination of non-biologic DMARDs. About 1% of the world-wide population is affected by rheumatoid arthritis. RA affects predominantly women (three times more susceptible than men) and is more frequent between ages 40 and 50, but people of any age can be affected.

Other approved drugs in this commercially competitive sector include Adalimumab (Trade Name: Humira, ChEMBL: CHEMBL1201580, DrugBank: DB00051), Etanercept (Trade Name: EnbrelChEMBL: CHEMBL1201572, DrugBank: DB00005), Infliximab (Trade Name: Remicade, ChEMBL: CHEMBL1201581, DrugBank: DB00065).

IUPAC Name: 3-(4-methyl-3-(methyl(7H-pyrrolo[2,3-d]pyrimidin-4-yl)amino)piperidin-1-yl)-3-oxopropanenitrile
Canionical Smiles: C[C@@H]1CCN(C[C@@H]1N(C)c2ncnc3[nH]ccc23)C(=O)CC#N
Standard InChi : InChI=1S/C16H20N6O/c1-11-5-8-22(14(23)3-6-17)9-13(11)21(2)16-12-4-7-18-15(12)19-10-20-16/h4,7,10-11,13H,3,5,8-9H2,1-2H3,(H,18,19,20)/t11-,13+/m1/s1

XELJANZ is the citrate salt of tofacitinib which is an inhibitor of Janus Kinase (JAK), an intracellular tyrosine kinase which transmit signals from cytokine or growth factor-receptor interactions on the cell membrane to influence cellular processes of hematopoiesis and immune cell function. Within the signalling pathway JAKs phosphorylate and activate Signal Transducers and Activators of Transcription (STATs) which modulate intracellular activity including gene expression. JAK-STAT system is a major signalling alternative to the secondary messenger system.

XELJANZ is specifically designed to inhibit the JAK pathways, which are signalling pathways inside the cell that play an important role in the inflammation involved in RA. Tofacitinib modulates the signalling pathway and prevents the phosphorylation and activation of STATs. JAK enzymes transmit cytokine signalling through pairing of JAKs (e.g., JAK1/JAK2, JAK1/JAK3, JAK1/TyK2, JAK2/JAK2). Tofacitinib has in vitro activities against JAK1/JAK2, JAK1/JAK3 and JAK2/JAK2 combinations.

The picture on left is PDB entry: 3lxn for crystal structure for TYK2 (in Red) with CP-690550 (in Blue/Green) and the picture on right is PDBe entry: 3lxk for crystal structure for JAK3 (in Red) with CP-690550 (in Blue/Green).

Recommended dosage is 5 mg orally. It has an apparent volume of distribution of 87 L and protein binding to the drug is approximately 40%, and it's bioavailability is 74% with elimination half-life of approximately 3 hrs. Metabolism of Tofacitinb is majorly mediated by Cytochrome P450 3A4 (CYP3A4) and minor contribution from Cytochrome P450 2C19 (CYP2C19). Clearance is estimated to be 70% hepatic, 30% renal.

Full prescribing information can be found here.

License holder is Pfizer and product website is

Wednesday, 14 November 2012


The testing and QC pixies have been really busy with Open ChEMBL - the OSDODS virtual machine appliance - this has made us aware of the support questions we're likely to get, and so we'd like to build our knowledge-base of support issues with a small pilot release, prior to facing a lot of queries about VirtualBox, ifconfig etc.

If you are interested in getting early access to Open ChEMBL, and have experience in configuration of vm's in heterogeneous environments - please get in touch.

Tuesday, 13 November 2012

DNDi screens MMV’s open access Malaria Box

The Drugs for Neglected Diseases initiative (DNDi) and Medicines for Malaria Venture (MMV) announce today the identification of three chemical series targeting the treatment of deadly neglected tropical diseases (NTDs), through DNDi’s screening of MMV’s open access Malaria Box. The resulting DNDi screening data are among the first data generated on the Malaria Box to be released into the public domain, exemplifying the potential of openly sharing drug development data for neglected patients.

The open access Malaria Box is an MMV initiative launched in December 2011 to catalyse drug discovery for malaria and neglected diseases. It contains 400 molecules, selected by experienced medicinal chemists to offer the broadest chemical diversity possible and is available free of charge. In return, MMV requests that any data gleaned from research on the Malaria Box are shared in the public domain within two years. To date, more than 100 Malaria Boxes have been delivered to over 20 countries for research on diseases including malaria, neglected diseases, HIV and cancer.

DNDi, in partnership with the Laboratory for Microbiology, Parasitology and Hygiene (LMPH), University of Antwerp, screened all the compounds in the Malaria Box against the parasites responsible for the three NTDs on which DNDi mainly focuses: sleeping sickness (human African trypanosomiasis), leishmaniasis (including visceral leishmaniasis, or kala azar, also known as black fever), and Chagas disease. This initial screen identified two potential drug series for the treatment of sleeping sickness and one for leishmaniasis. The DNDi screens have yielded valuable information that will strengthen DNDi’s research pipeline. All the biological data from DNDi’s screen, together with the existing preliminary data from MMV, are now publicly available on the open-source ChEMBL database.

SMS-DrugNet Allosteric Regulators Workshop, Edinburgh, December 2012.

For many classically 'undruggable' targets, there is sometimes the prospect of the discovery and optimisation of allosteric regulators, these can offer advantages in more selective target regulation, or improve the drug-like properties of compounds that bind to the allosteric site. However, allosteric regulators are often discovered via serendipity, and many screens are not configured optimally to identify allosteric regulators.

As part of the grant we are involved in, there is an Allostery Workshop taking place at the University of Edinburgh on 4th December 2012. The Workshop, sponsored by the British Council, involves an extensive delegation of scientists from Turkey led by Burak Erman, Koc University, Istanbul and will bring together a diverse area of disciplines including Biology, Chemistry, Computer Science, Informatics, Mathematics and Medicine. The program for the day will include presentations and poster session.

Gerard Van Westen from the group here will be presenting some of our recent work on annotation and classification of allosteric/orthosteric regulators in ChEMBL. The title of his talk is "Beating (the) Competition in Lead Discovery: Property and Structural trends in Allosteric Regulators"

Further details are available on this link.

Update: Space is limited, so if you are interested in going, register early!

Friday, 9 November 2012

ChEMBL Virtual Machine

Next week we will be releasing ChEMBL Virtual Machine. We have referred to it in a previous post and had hoped to make it available this week, but as always with best laid plans....

So, we are using this post to generate some pre-release excitement :) and also to acknowledge the hard work of Rodrigo Ochoa who worked on this project during his 5 month internship with the ChEMBL group.

We will be providing a lot more detail in next weeks blog post, but as a quick summary the VM is based on a Ubuntu linux build and comes preloaded with ChEMBL_14 (in PostgreSQL), RDKit and a web application, which makes use of Marvin and allows users to easily get started with querying the ChEMBL data.

New Drug Approvals 2012 - Pt. XXIV - Ocriplasmin (JetreaTM)

On October 17, the FDA approved Ocriplasmin (tradename: Jetrea; Research Code: Microplasmin), a proteolytic enzyme indicated for the treatment of symptomatic vitreomacular adhesion (VMA). VMA is a condition of the eye that results from the liquefaction of the vitreous gel within the human eye and consequent adhesion to the retina. As the eye ages, the vitreous humor can naturally separate from the retina. However, if the separation is not complete, areas of adhesion can occur. The traction from these adhesion areas on the retinal surface is the underlying pathology of symptomatic VMA, which can lead to ocular damage. Ocriplasmin is the first drug approved to treat this condition and it exherts its therapeutic action by selectively breaking down the three major protein components, fibronectin, laminin and collagen, of the vitreous body and vitreoretinal interface, and thereby dissolving the protein matrix responsible for VMA. The only alternative treatment is a surgical procedure to remove the vitreous from the eye, known as vitrectomy.

Ocriplasmin is a recombinant truncated form of the human plasmin (ChEMBL: CHEMBL1801; Uniprot: P00747) with a molecular weight of 27.2 kDa produced by recombinat DNA technology in a Pichia pastoris expression system.

>PLMN_HUMAN Plasminogen (Ocriplasmin is displayed in bold)

The PDBe entry (PDBe:1bui) for the crystal structure of Ocriplasmin (in green) in complex with Staphylokinase (in red) is displayed below.

The recommended dosage of Ocriplasmin is 0.125 mg (0.1 mL of the diluted solution) administrated by intravitreal injection to the affected eye once as a single dose.

The license holder is ThromboGenics, Inc. and the full prescribing information of Linaclotide can be found here.

Sunday, 4 November 2012

New Drug Approvals 2012 - Pt. XXIII - Omacetaxine mepesuccinate (SYNRIBOTM)

ATC code: L01XX40
Wikipedia: Omacetaxine_mepesuccinate

On October 22nd 2012 the FDA approved omacetaxine mepesuccinate (research code: CGX-635, trivial name: Homoharringtonine, trademark: SynriboTM) for the treatment of chronic or accelerated phase chronic myeloid leukaemia (CML) in adults with resistance to two or more tyrosine kinase inhibitors. Omacetaxine is an old drug identified 35 years ago and known to have activity in CML, but its clinical development was previously halted due to the discovery of BCL-ABL and other targeted kinase inhibitors Pubmed: 21294709. The rapid development of tyrosine kinase inhibitor resistant tumors has led to the need for agents that can act in these treatment-derived drug-resistant patients. Omacetaxine mepesuccinate has been approved based on observed major cytogenetic response rather than on improvement in disease-related symptoms or increased survival.

Omacetaxine mepesuccinate/homoharringtonine is a cephalotaxine ester of prepared by a semi-synthetic process from cephalotaxine, an extract from the leaves of Cephalotaxus sp found in Southern China. Its molecular formula is C29H39NO9, with an IUPAC name of 4-methyl (2R)-hydroxyl-2-(4-hydroxyl-4-methylpentyl) butanedioate, its molecular weight is 545.6 Da. It is absorbed following subcutaneous administration, and maximum plasma (Tmax) concentrations are achieved after ~30 minutes. The volume of distribution (Vd) is variable at 141 +/- 93.4 L following subcutaneous administration of 1.25 mg/m2 twice daily for 11 days.

The mechanistic target of omacetaxine mepesuccinate is not fully known, however its action includes inhibition of protein synthesis. It was found to bind to the A-site cleft in the peptidyl-transferase center of the large ribosomal subunit in archaeabacteria and is understood to prevent the correct positioning of amino acid side chains of incoming aminoacyl-tRNAs, thus inhibiting the elongation step of peptide synthesis. 

It was shown in vitro to reduce the protein levels of a number of oncogenic proteins including BCR-ABL and MCL1. Because it does not bind BRC-ABL directly, it is active both in wild-type and resistant tumours harbouring the ABL drug resistance mutant: T351I as seen in mouse models.

Prescribing information is here

SynriboTM is marketed by IVAX pharmaceuticals

Paper: Mapping small molecule binding data to structural domains

We've just published a paper on mapping the sites of small molecule binding in complex multidomain proteins (pdf here - this link doesn't seem to work at the moment, sorry). The resolution of the mapping is at the level of Pfam domains. We love Pfam, and love it even more that the Pfam team is moving to the EBI this week. The motivation for this work is multifold, and it addresses a pretty big problem in chemogenomics.

Firstly the issue of domain frustration - you search a protein containing a series of distinct domains looking for homologues in ChEMBL. If your protein contains a common and uninteresting domain, something like a zinc finger or EGF domain (our interest is for small molecule binding remember, we're not saying that these domains are completely boring, they're just a lot less interesting from a chemical biology/drug discovery perspective) you'll retrieve a whole bunch of sequence related, but small molecule binding unrelated data. It's just the way bioinformatics works. You can be selective in searching with just the sequence of the domain you are interested in, but this only solves half the problem, since there's no guarantee that compounds retrieved will bind at that domain in the retrieved protein.

Secondly the issue of selective signalling - it is now common to see proteins as existing in pathways, and that there are key control nodes that are highly connected to many other proteins, a key question is therefore how signalling remains specific under this everything-is-linked-to-almost-everything-by-interactions setting. Well we think that the annotation of pathways at the domain interaction level is under-appreciated. For example, consider a protein X that interacts with three other proteins Y, W and Z, and that Y and Z interact through domain 1 in X, and W via domain 2. If you have a small molecule that binds to domain 1 in X, you will selectively affect downstream signalling to Z and Y only.

These data will end up in the next version of ChEMBL - but if you want to get hold of any data prior to this, check out the supplementary data for the paper.

Next steps? Well the approach to mapping and scoring the domains could be improved, and the resolution ideally needs to be at a site level within a domain - so that compounds that bind at different structural sites can be differentiated for model development, pharmacophores, etc. It is an undeniable fact that merging compounds that bind at different sites, with different binding determinants will not be able to predict each other. We have made some progress on this latter sub-problem, and more on that soon.

%T Mapping small molecule binding data to structural domains
%A F.A. Kruger
%A R. Rostom
%A J.P. Overington
%J BMC Bioinformatics 
%D 2012
%V 13
%O doi:10.1186/1471-2105-13-S17-S11

Thursday, 1 November 2012

New Drug Approvals 2012 - Pt. XXII - Perampanel (FycompaTM)

ATC Code: N03AX22
Wikipedia: Perampanel

On October 22nd 2012 the FDA approved Perampanel (research code: E2007, ER-155055-90, trade name Fycompa, CHEMBL1214124). Perampanel is an orally administered drug to be used as an adjunctive therapy for the treatment of partial-onset seizures with or without secondary generalized seizures in patients with epilepsy.

Epileptic seizures are defined as "abnormal excessive or synchronous neuronal activity in the brain". The net symptoms can be very diverse, from severe thrashing movements to a very mild brief loss of awareness. Approximately 4% of the population will have experienced a unprovoked seizure by the age of 80, with a 30-50% chance of repeat in this group. Seizures can last from a few seconds to a state of life threatening persistent seizure (known as status epilepticus).

Approximately 25 % of the people suffering from a seizure or  status epilepticus will be diagnosed to have epilepsy. Treatment may reduce the chance of a second seizure by as much as 50%.

Perampanel acts by non-competitively inhibiting the ionotropic α-amino-3-hydroxy-5-methyl-4- isoxazolepropionic acid (AMPA) glutamate receptor. This receptor consists of a heteromeric combination of 2 out of 4 known subunits GluR-1 - GluR4 (respectively: CHEMBL2009, CHEMBL4016, CHEMBL3595 and CHEMBL3190 or Uniprot 42261, 42262, 42263 and 48058). Of these, the combinations GluR-1/GluR-2 and GluR-2/GluR-3 are the most frequent. The specific mechanism by which Perampanel exerts its antiepileptic effect in humans has not been fully elucidated.

Fycompa is a small molecule drug with a molecular mass of 349.4 g/mol, an AlogP of 3.57 , 3 rotatable bonds and does not violate the rule of 5. Canonical SMILES : O=C1N(C=C(C=C1c2ccccc2C#N)c3ccccn3)c4ccccc4
InChi: InChI=1S/C23H15N3O/c24-15-17-8-4-5-11-20(17)21-14-18(22-12-6-7-13-25-22)16-26(23(21)27)19-9-2-1-3-10-19/h1-14,16H

The recommended starting dose of Perampanel in the absence of other CYP3A4 enzyme-inducing antiepileptic drugs is 2 mg once daily taken orally at bedtime (and can be incremented to the recommended dose range of 8 mg to 12 mg once daily). The recommended starting dose of Perampanel in the presence of CYP3A4 enzyme-inducing antiepileptic drugs is 4 mg and patients should be monitored closely for response.

Perampanel is extensively metabolized via initial oxidation and sequential glucuronidation. Oxidative metabolism is mediated by CYP3A4 and/or CYP3A5 based on results of in vitro studies using recombinant human CYPs and human liver microsomes. Other CYP enzymes may also be involved.

The license holder is Eisai Inc. and the full prescribing information can be found here.