ChEMBL Resources


Tuesday, 30 November 2010

canSAR v1.0 launched

canSAR logo Yesterday, Cancer Research UK press released the launch of the first full version of canSAR - the Institute of Cancer Research's integrated cancer research and drug discovery resource. canSAR integrates large volumes of disparate data covering most aspects of cancer biology and chemistry, and is an example of how to complement the chEMBL database with therapeutic area specific knowledge. canSAR integrates biological annotation, gene expression, RNA interference studies, structural biology and protein interaction network data - as well as chemical and pharmacological data. It contains annotation on the entire human proteome, and contains >8 million experimental data points including RNAi and chemical screening data. For full release notes please see canSAR news. canSAR is updated monthly. As well as the wonderful chEMBL, the data in canSAR comes from a large number of sources, including ArrayExpress, PDBe, ROCK, STRING, Genomics of Drug Sensitivity in Cancer, COSMIC, BindingDB, SCOP, PFAM- we (at the ICR) are grateful to our friends at all these places for their help. In the new year, we will be holding a series of webinars and walkthroughs, and details of these will be posted on the ChEMBL-og.

Sunday, 28 November 2010

Any old unwanted SGI dial boxes?

I am doing a surprising amount of molecular graphics stuff at the moment, and have finally realised I don't have the skills or coordination to use a mouse/keyboard to do simple rotations/scaling/clipping etc. So, it turns out it's possible to connect up a dial box to a aMacBook Pro, with a little bit of messing around with drivers. So does anyone have an old, unwanted dial box for an SGI machine? Part numbers 9980991, 9980992, and 9780804 are all apparently OK. If you have one of these, let me know.

Monday, 22 November 2010

2010 New Drug Approvals - Pt. XVIII - Tesamorelin (Egrifta)

ATC code (partial): H01AC

Also this month, on November 10th, FDA has approved Tesamorelin under the trade name Egrifta. Tesamorelin (research code:TH-9507) is an analog of the human growth hormone-releasing factor (GRF) (UniProt:P01286, synonym:Somatoliberin, synonym:GRF, synonym:GHRH) indicated for the reduction of excess abdominal fat in HIV-infected patients with lipodystrophy. Lipodystrophy is a condition in which excess fat develops in atypical areas of the body, most notably around the liver, stomach, and other abdominal organs. This condition is observed as a side effect with many antiretroviral drugs used to treat HIV. Tesamorelin is the first-FDA approved treatment specifically approved for lipodystropy.

The -relin INN stem covers prehormones or hormone releasing peptides, a very broad range of targets and pharmacology. The -morelin stem sub-group covers growth hormone-release stimulating peptides including capromorelin, dumorelin, examorelin, ipamorelin, pralmorelin, rismorelin, sermorelin, somatorelin, and tabimorelin.

Tesamorelin is an N-terminally modified variant of the natural 44 residue long peptide, Somatoliberin, which is a hypothalamic peptide, acting on the pituitary somatotroph cells to stimulate the synthesis and pulsatile release of endogenous growth hormone (GH), which is both anabolic and lipolytic. Somatoliberin is a member of the glucagon family (Pfam:PF00123) of endogenous peptide ligands. Tesamorelin exerts its therapeutic effects by binding to, and being an agonist of GHRHr - a type-2 (or class B, or secretin-like) GPCR (Uniprot: Q02643, ChEMBL:CHEMBL158, Pfam:PF00002), on pituitary somatotrophs; the triggered release growth hormone (GH) in turn acts on a variety of target cells, including chondrocytes, osteoblasts, myocytes, hepatocytes and adipocytes, resulting in a host of pharmacodynamic effects, which are primarily mediated by insulin-like growth factor 1 (IGF-1) produced in the liver and in peripheral tissues.

Tesamorelin has Molecular Weight of 5135.9 Da, absolute bioavailability, following s.c. dosing is less than 4%, with a volume of distribution of 10.5 L/kg (in HIV-infected patients) and an elimination half-life (t1/2) of 38 minutes (again in HIV-infected patients). The recommended dosage is 2 mg injected subcutaneously daily (typically in the abdomen) - a typical daily dose is therefore 0.39 umol).


Tesamorelin is produced synthetically and is otherwise identical in amino acid sequence to that of human Somatoliberin/GRF. Tesamorelin is then modified by attachment a 3-hexenoyl moiety via an amide linkage to the N-terminal tyrosine residue. This chemical modification blocks proteolytic degradation by endogenous proteins such as DPP-IV, thus prolonging the half-life of the peptide (the inhibition of DPP-IV is itself the basis of a number of therapies for the treatment of type-II diabetes - the gliptins). A further chemical modification is the C-terminal amidation - this p.t.m. is found in the naturally produced peptide. Tesamorelin is closely chemically related to a number of other clinical agents, such as Sermorelin (which is a shorter, but still active version of Somatoliberin/GRF)

The full prescribing information can be found here.

The license holder is EMD Serono, Inc. and the product website is

Domain-level annotation of binding-sites for ligands within ChEMBL

One big problem of simple sequence searching with tools like blast with ChEMBL is the problem of the introduction of contextually incorrect target relationships due to matching of irrelevant domains. For example, imagine a protein, X, that contains two domains of types A and B, and a second protein, Y, which contains also two domain types, B and C. If the ligand is known to bind to domain type A, there is no ligand-binding relationship between X and Y; however, if the ligand binds at domain type B in X, then there is a relevant relationship between X and Y. This may sound like an rare example, but it is surprisingly common (and extremely annoying), since the majority of eukaryotic proteins are multi-domain, and the presence of certain domains, such as an EGF-like domain (Pfam:pf00009) can greatly complicate the analysis of sequence searches. What is really needed is a reliable mapping (or more generally a probabilistic score) of the ligand-binding domains within a particular protein.

Enough of all these Xs and Ys! Here is a real example, for three interesting proteins, Axl, Lck and SOCS3. As you can see, protein kinase domain inhibitors are only 'transferable' between Axl and Lck, while SH2 binders are only 'transferable' between Lck and SOCS3.

Here is a graph (as a pie chart) of the Pfam domains for the ligand binding regions of all the protein targets in the current (Chembl_08) target dictionary. The annotation, was performed by a simple classifier heuristic, and we are validating the accuracy of this approach at the moment, but it appears to be largely correct. Once we're happy with the results, we'll add the ligand-binding-domain data to the target dictionary.

Sunday, 21 November 2010

Do you want to know about the Chembl Database User Group meeting?

The ChEMBL Database User Group on Linkedin is perilously close to 100 members - in fact we need just one more to make it to the magic century! We have found a well known industry figure to help organise our first User Group meeting, and we'll start posting details shortly on the LinkedIn group site.

What Are The Key Clinical Candidate Disclosure Meetings?

Here is a call for assistance, all input will end up published on The ChEMBL-og, and accessible to one-and-all. What we're looking for is a pretty comprehensive list of key clinical candidate disclosure meetings, ideally those with disclosure of chemical structure, functional assay, pharmacokinetic and toxicology data - you know the sort of meeting, where the key data on a hot compound is disclosed for the first time.

I've put together a preliminary list here, from memory and a little bit of googling - this is far, far, far from perfect and is woefully incomplete, and am now looking for addition of extra meetings for the areas not covered, and some highlighting of additional ones. As you will see I've used the ATC classification for the structure of the list - although this is not perfect for things like anti-microbials, etc., it is actually a pretty good framework to hang this off.

If you have any suggestions, please mail them in....

Finally, if you are interested in hearing our plans, and maybe collaborating on some informatics aspect of this, feel free to contact us.

Friday, 19 November 2010

2010 New Drug Approvals - Pt. XVII- Eribulin Mesylate (Halaven)

ATC code (partial): L01C

On November 15th, 2010, the FDA approved Eribulin Mesylate (ResearchCode:E-7389) under the trade name Halaven (TradeMark:Halaven). It is indicated for for the treatment of patients with late stage, metastatic breast cancer who have previously received at least two chemotherapeutic regimens for the treatment of metastatic disease. Phase III trials showed that patients survived a median of 2.5 months longer than patients treated with other current alternatives. Eribuln is a synthethic analogue of halichondrin B, a cytotoxic polyether macrolide marine natural product.

The mechanism of action of Eribulin is anti-mitotic and is mediated via tubulin binding, where it leads to G2/M block in the the cell-cycle; after prolonged stalling in this state, cells enter apoptosis and are then cleared.

Eribulin is a large (Mwt 729.9 for Eribulin and 826.0 for the mesylate salt) synthetic compound (an analogue of halichondrin B) an IUPAC name of the structure is 11,15:18,21:24,28­ Triepoxy-7,9-ethano-12,15-methano-9H,15H-furo[3,2-i]furo[2',3':5,6]pyrano[4,3­ b][1,4]dioxacyclopentacosin-5(4H)-one, 2-[(2S)-3-amino-2-hydroxypropyl]hexacosahydro-3­ methoxy-26-methyl-20,27-bis(methylene)-, (2R,3R,3aS,7R,8aS,9S,10aR,11S,12R,13aR,13bS,15S,18S,21S,24S,26R,28R,29aS)-, methanesulfonate (salt). The most striking part of the structure is the highly fused, rigid ring system, as you would expect, the synthesis is complicated. The structure contains many of the classical features of natural products - a high number and fraction of defined chiral centers, a high ratio of oxygens to nitrogens, and a high ring count.

The recommended dosing is 1.4mg/m2 as two intravenously delivered doses, separated by seven days, repeated after a further two weeks. An average adult human has a skin surface area of ca. 1.8 m2, so this would equate to a single dose of ~3 umol)  The mean half-life of Eribulin is ~40 hr, with a mean volume of distribution of ~80 L/m2, and a mean clearance of ~1.8 L/hr/m2. Plasma protein binding is around 58%. Eribulin is metabolically stable and is largely unmetabolised, with the majority of the dosed drug being excreted as the dosed form in the feces.

Eribulin binds at (or near) the vinca domain of tubulin, a region that is located at the interface of two tubulin heterodimers when arranged end to end and overlaps the exchangeable GTP site on β-tubulin (Bai et al). β-tubulin is small family of related human proteins (PFAM:PF03953, HOMSTRAD:tubulin, and UniProt:P07437 for a specific member) that are key components of microtubules. There are multiple isoforms of β-tubulin e.g. "tubulin-beta1" , ChEMBLDB ID: CHEMBL1915, canSAR:link; and "tubulin-beta5", ChEMBLDB_ID:CHEMBL5444, canSAR link. Multiple 3-D structures are available for alpha-/beta-tubulins including PDBe:1tub. Tubulins are the target of several other classes of anticancer drugs, such as Paclitaxel (aka taxol) and Vinblastine (both similarly cytotoxic natural products)

NAME="Eribulin Mesylate"
ATC_code= L01C

Full prescribing information here The license holder for Halaven™ is Eisai Inc.

Monday, 15 November 2010

ChEMBL_08 Released

We are pleased to announce the release of chembl_08. This version of the ChEMBL database was prepared 26th October 2010 and contains:
  • 735393 compound records
  • 636269 compounds (of which 635933 have molfiles)
  • 488898 assays
  • 2973034 activities
  • 8088 targets
  • 38462 publications
  • 5 activity data sources
You can also download the ChEMBL database (Oracle 9i, 10g, 11g or MySQL) from our ftp site: Changes to the database (please see release notes for more detail):
  1. FDA approved drugs have now been added to the compounds table*. Some drugs (e.g., biotherapeutics) do not have a structure/molfile, and not all drugs have bioactivity data associated with them. Further information for these drugs (e.g., mechanism of action) will be added in subsequent releases.
  2. Parent compounds have been generated by removing the salt component from any compounds tested as a salt form. Both the parents and the salt forms are recorded in the compounds table and a new table: molecule_hierarchy shows the relationship between them.
  3. ChEMBL identifiers (chembl_id) have been added to the compounds, target_dictionary, assays and docs tables. These take the form 'CHEMBL' followed immediately by an integer (e.g., CHEMBL941) and are used on the interface. Small molecules within the database will still have a ChEBI ID, and protein targets a UniProt accession, in addition.
You can access the data via the ChEMBL database interface: Changes to the interface:
  1. The interface now uses chembl_id for compounds, assays and targets. Old URLs (e.g., using chebi_id/assay_id/tid) will continue to work, however we recommend using the chembl_id when linking to the ChEMBL interface.
  2. The compound, target and assay report card pages now include interactive pie charts to allow users to link to related data sets in the ChEMBL database e.g.
  3. Compound report card page has been updated to include the drug icons, for FDA approved molecules* in ChEMBL e.g.
The ChEMBL Team *The identification and loading of the FDA approved compounds in the ChEMBL database is part of a larger process of integrating drug and clinical candidate information into the ChEMBL database. This process has not not been completed, so please expect enhancements to the underlying schema and interface in future releases of the ChEMBL database.

Saturday, 13 November 2010

SMR Meeting - Trends in Medicinal Chemistry - 9th December 2010

The next Society for Medicines Research meeting is on the 9th December 2010 at the National Heart and Lung Institute, Kensington, London. These are my favourite day meetings, cheap, well organised and very applied to actual drug discovery. This meeting there are some great talks.

Unfortunately, I cannot go to the SMR meeting - I will be ill (probably), in a hotel (hopefully), maybe on the beach (certainly) at the Zing Structural Biology Conference. The graph above answered an important question for me.

Thursday, 11 November 2010

2010 New Drug Approvals - Pt. XVI - Ceftaroline Fosamil (Teflaro)

ATC code (partial): J01DI

On October 29th, FDA has approved Ceftaroline Fosamil under the trade name Teflaro. Ceftaroline Fosamil (previously known by the research code TAK-599, the parent drug, Ceftaroline is also known as T-91,825) is an antibiotic indicated for the treatment of adults with acute bacterial skin and skin structure infections (ABSSSI) caused by susceptible Gram-positive and Gram-negative microorganisms, such as Staphylococcus aureus (including methicillin-susceptible and -resistant isolates), Streptococcus pyogenes, Streptococcus agalactiae, Escherichia coli, Klebsiella pneumoniae, and Klebsiella oxytoca, and also for the treatment of community-acquired bacterial pneumonia (CABP) caused by susceptible Gram-positive and Gram-negative bacteria, such as Streptococcus pneumoniae (including cases with concurrent bacteremia), Staphylococcus aureus (methicillin-susceptible isolates only), Haemophilus influenzae, Klebsiella pneumoniae, Klebsiella oxytoca, and Escherichia coli.

Ceftaroline Fosamil is a semisynthetic antibacterial of the cephalosporin class of beta-lactams, which are originally identified in 1948 from the Cephalosporum/Acremonium. Ceftaroline Fosamil is the phosphamide prodrug of the bioactive Ceftaroline. Like other drugs in the same class, the bactericidal action of Ceftaroline is mediated through covalent binding to essential penicillin-binding proteins (PBPs) in the bacteria wall. In particular, ceftaroline is bactericidal against S. aureus, including methicillin-resistant S. aureus (MRSA), due to its affinity for PBP2a (Uniprot: Q53707, ChEMBL: 19669), the type of PBP produced by MRSA and not well inhibited by other antibiotics such as methicillin (ChEMBL: 116716), oxacillin (ChEMBL: 156432), penicillin, and amoxicillin (ChEMBL: 657723). Ceftaroline is also active against S. pneumoniae due to its affinity for PBP2x (Uniprot: P14677, ChEMBL: 102467).

Ceftaroline Fosamil is a large 'small-molecule' semisynthetic prodrug (Molecular Weight of 685.7 g.mol-1 for Ceftaroline Fosamil itself and 762.7 g.mol-1 for the monoacetate salt), slightly lipophilic and soluble in water. Following injection, Ceftaroline Fosamil has a volume of distribution of 20.3L, a low plasma protein binding (ppb) of 20%, an elimination half-life of 1.6hr and a plasma clearance of 9.58 L/hr. Ceftaroline Fosamil is primarily eliminated by the kidneys (88% of the dose is recovered in urine) and mainly as the active metabolite ceftaroline (64% as ceftaroline and 2% as an inactive metabolite). Ceftaroline is not an inhibitor or substrate of the major cytochrome P450 isoenzymes. The recommended dosage of Ceftaroline Fosamil is 600mg every 12 hours by intravenous infusion administrated over an hour.

The full prescribing information can be found here. Like other cephalosporins, Ceftaroline Fosamil structure (6R,7R)-7-{(2Z)-2-(ethoxyimino)-2-[5-(phosphonoamino)-1,2,4thiadiazol-3-yl]acetamido}-3-{[4-(1-methylpyridin-1-ium-4-yl)-1,3-thiazol-2-yl]sulfanyl}-8-oxo-5-thia-1azabicyclo[4.2.0]oct-2-ene-2-carboxylate contains a cyclic amide (the beta-lactam ring) fused with a six member ring (the cephem ring). Another notable feature of Ceftaroline Fosamil is the thiazolylthio group, which is thought to be crucial for the activity against MRSA.

NAME="Ceftaroline Fosamil"
ATC_code= NA

The license holder is Forest Pharmaceuticals, Inc. and the product website is

Small Molecules Bioactivity Course - February 2011

Registration for the 2011 residential ChEMBL Training course, running from the 14th to the 18th of Feburary, is now underway, further details can be found at this link.

What better way to spend Valentine's Day?

The picture above is of the ingredients in a Twinkie, apparently.

Wednesday, 10 November 2010

Staff Position in ChEMBL - EU-OPENSCREEN database developer

The EBI recruitment website now has the EU-OPENSCREEN developer position detailed. Closing date is the 12th December 2010.

The job is an exciting opportunity to work on establishing a pan European archive of academic screening data.

Tuesday, 9 November 2010

2010 New Drug Approvals - Pt. XV - Lurasidone (Latuda)

ATC code (partial): N05AE

On October 28th 2010, the FDA approved Lurasidone (Tradename:Latuda) (Lurasidone is also known by the research code SM-13,496). Lurasidone is an atypical antipsychotic agent indicated for the treatment schizophrenia.

Lurasidone displays broad polypharmacology against a wide range of rhodopsin-like aminergic GPCRs, acting as an antagonist with high affinity at dopamine D2 receptors (Uniprot: P14416, ChEMBL: 72) (Ki of 1 nM), serotonin 5-HT2A (Uniprot: P28223, ChEMBL: 107) (Ki of 0.47 nM) and 5-HT7 receptors (Uniprot: P34969, ChEMBL: 10209) (Ki of 0.49 nM), and with moderate affinity at alpha-2C adrenergic receptors (Uniprot: P18825, ChEMBL: 218) (Ki of 10.8 nM) and at alpha-2A adrenergic receptors (Uniprot: P08913, ChEMBL: 52) (Ki of 40.7 nM). Lurasidone acts also as a partial agonist at serotonin 5-HT1A receptors (Uniprot: P08908, ChEMBL: 51) (Ki of 6.4 nM) and exhibits little or no affinity for histamine H1 (Uniprot: P35367, ChEMBL: 127) and muscarinic M1 receptors (Uniprot: P11229, ChEMBL: 61) (IC50 > 1000 nM and IC50 > 1000 nM, respectively). The efficacy of Luasidone is thought to be primarily related to the D2 and 5HT2A antagonism. All atypical antipsychotics display this complex polypharmacology.

Lurasidone is a synthetic small-molecule drug (Molecular Weight of 492.7 g/mol for Lurasidone itself and 529.14 g.mol-1 for the dosed HCl salt), is fully Rule-of-Five compliant, lipophilic and very slightly soluble in water.

Lurasidone has low systemic bioavailability (9-19%), and a high volume of distribution of 6173L, and displays high plasma protein binding (ppb) of ~99%. The half life is 18 hours, and steady-state plasma levels are reached 7 days after starting regular dosing. Lurasidone is predominantly metabolized by CYP3A4 into four major metabolites (two active metabolites and two 'inactive') - metabolites include hydroxylation of the nornbornane ring, N-dealkylation and S-oxidation. The apparent clearance is 3902mL/min, with the bulk of the drug being excreted in the feces. Dosage is oral, with a recommended starting dosage is 40 mg once daily (equivalent to 81umol), with a recommended maximum dosage of 80 mg daily.

Lurasidone is a chiral benzoisothiazol derivative - the benzoisothiazol is the fused five-six dual ring structure on the right of the figure above. Its structure (3aR,4S,7R,7aS)-2-{(1R,2R)-2-[4-(1,2-benzisothiazol-3-yl)piperazin-1-ylmethyl] cyclohexylmethyl}hexahydro-4,7-methano-2H-isoindole-1,3-dione contains an imide heterocyclic and a piperazine functional group. The central piperazine nitrogen is basic. The chemical structure, properties and pharmacology are similar to Ziprasidone (Trademark:Geodon).

ATC_code= NA

The full prescribing information can be found here.

Lurasidone has a boxed warning (colloquially known as a 'black box').

The license holder is Sunovion Pharmaceuticals Inc. and the product website is

Saturday, 6 November 2010

ChEMBL team in Japan - March 2011

Some of the ChEMBL team are in Tokyo, Japan for the week of 14th to 18th March 2011. We are there for some training with a valued collaborator. We have some spaces in our schedule so we could visit and talk if there is interest. We have a native Japanese speaker in our group, and so we can present in both Japanese and English. If you would like us to visit you please mail.


Wednesday, 3 November 2010

2010 New Drug Approvals - Pt. XIV - Dabigatran etexilate (Pradaxa)

ATC Code: B01AE07

Dabigatran etexilate has been approved by the FDA on October 19th 2010. Dabigatran etexilate (also known as BIBR-1048 for Dabigatran etexilate and BIBR-953 for Dabigatran) is approved for the treatment of patients with atrial fibrillation at risk of embolism or stroke. Dabigatran etexilate is a first-in-class (for the US) oral drug preventing blood clotting and stroke by direct inhibition of thrombin and is marketed under the trade name Pradaxa in Europe and the US, and Pradax in certain other territories. In Europe, an earlier oral direct thrombin inhibitor Xemelagatran (trademark:Exanta trademark:Exarta also known as H376/95) was approved, but subsequently was withdrawn due to commercial and perceived safety issues.

The formation of blood clots in the circulation can cause embolism or stroke (or CVA) if other risk factors are present.  Depending on the number of risk factors, the risk of suffering a stroke increases up to 7-fold in patients with atrial fibrillation. Patients with atrial fibrillation are therefore often treated with the anticoagulant warfarin (ChEMBL: 494165) to prevent the formation of blood clots, Warfarin is a drug with a poor therapeutic index, and also shows substantial intra-patient variability due to underlying genetic differences, with subsequent required regular patient monitoring.

Dabigatran etexilate is converted to the active drug Dabigatran. It inhibits blood clotting through direct inhibition of thrombin (Uniprot: P00734) and has a larger therapeutic window than warfarin (which is an irreversible inhibitor of vitamin K epoxide reductase). Thrombin is a key serine protease in the blood clotting cascade, activating coagulation factors Factor V, Factor VIII, Factor XI and Factor XIII as well as cleaving fibrinogen and thus transforming it to the blood clot forming fibrin (also known as Factor Ia) (Uniprot: P02679). There are many known structures of thrombin, both in prothrombin and mature thrombin forms (for example see PDBe:2bvs).

Thrombin is a trypsin-like serine proteinase (Pfam:PF00089), and cleaves after arginine (and lysine) residues at the P1 position in substrates; Dabigatran is a substrate mimic (a peptidomimetic), with the phenylamidine mimicking the arginine sidechain.

Upon absorpotion, Dabigatran etexilate is readily metabolized to the active drug Dabigatran by ester hydrolysis at two distinct positions (therefore Dabigatran is dosed as a double prodrug). The charged groups of Dabigatran (the amidine and carboxylic acid) are poorly absorbed across membranes, and therefor the lipophilic ester and carbamate are added to mask these groups during oral dosing and absorption. Dabigatran is further metabolized to four different acyl glucuronides which are equally active as thrombin inhibitors. The absolute oral biovailability of Dabigatran (dosed as the prodrug) is 3-7%. The fraction of dabigatran bound to plasma proteins (ppb) is ~35% and volume of distribution (Vd) is 50-70L, clearance is primarily renal with a half-life (t1/2) of 12-14 hours.

Dabigatran etexilate is recommended for twice daily administration of 300mg (in two 150mg doses).

Dabigatran etexilate is marketed under the trade name Pradaxa by
Boehringer Ingelheim.
Full prescribing information can be found here.