ChEMBL Resources

Resources:
ChEMBL
|
SureChEMBL
|
ChEMBL-NTD
|
ChEMBL-Malaria
|
The SARfaris: GPCR, Kinase, ADME
|
UniChem
|
DrugEBIlity
|
ECBD

Friday, 24 December 2010

Summary of U.S. New Drugs For 2010

Here is an initial list of the 2010 US new approved drugs (specifically New Molecular Entities). The way we count things, there were 19 novel newly approved drug substantces in the US last year.

#USANTradenameIcon
1 Tocilizumab Actemra / RoActemra
2 Dalfampridine Ampyra
3 Liraglutide Victoza
4 Velaglucerase alfa VPRIV
5 Carglumic acid Carbaglu
6 Polidocanol Asclera
7 Denosumab Prolia
8 Cabazitaxel Jevtana
9 Sipuleucel-T Provenge
10 Ulipristal Acetate Ella
11 Alcafatadine Lastacaft
12 Pegloticase Krystexxa
13 Fingolimod Gilenya
14 Dabigatran Etexilate Pradaxa
15 Lurasidone Latuda
16 Ceftaroline Fosamil Teflaro
17 Eribulin Mesylate Halaven
18 Tesamorelin Egrifta
19 Dienogest Natazia


12 are small molecule drugs, and 7 are biologicals. Of the small molecule drugs, 6 (32%) are small molecule synthetic drugs, 6 (32%) are small molecule natural product-derived drugs, 6 (32%) are biologicals (including peptides, enzymes and mAbs) and one (5%) is a cell-based therapy. Also interesting is the fact that the majority are parenterally dosed (11 of 19) (58%).


For details on the icon set used in the table, see this link.

Following some checking, I've added Dienogest to the list (it is part of the combination product Natazia), and updated the analysis below... Some sources are stating that there are 21 'New Drugs' for 2010; however, a 'new drug' is not necessarily the same as an NME, and also there are some inconsistencies on the FDA approval tables for 2010 at the current time (for NMEs that everolimus (Zortress) was first approved in the US in 2010, it was actually first approved in 2009 as Affinitor), that make counting the NMEs for the year problematic. the raw approval data from the FDA is in a series of monthly charts, accessible here (unfortunately, there is no easy, web-friendly way to provide a set of useful links, you'll just have to type in the months). In these tables you should look for the 1s, as being the new NMEs, as you will see, quite a few are unassigned, and as mentioned above there are some errors (e.g. everolimus was first approved (as a new NME) last year, however, under a different Tradename, for a different indication).


UPDATE: One of the potentially new NMEs of last year is incobotulinumtoxinA (trademark:Xeomin), this is a type A botulinum toxin, in the same class as abobotulinumtoxinA (trademark:Dysport, Reloxin, Azzalure), and onabotulinumtoxinA (trademark:BOTOX). These are essentially identical from an active component perspective (the USAN statements are abobotulinumtoxinA, incobotulinumtoxinA, and onabotulinumtoxinA) and the sequences are essentially identical. It is the convention, that due to the very high potency, and subsequent differences in potency from different production/processing routes for botulinum toxin products, that different USANs are assigned to highlight the non-bioequivalence of different products. This is part of a broader issue of assigning bioequivalence of biological drugs, which has exercised drug producers, regulators, and consumers over recent years. Since we are mostly interested in drugs differentiated by differing molecular structures, we do not consider these are distinct NMEs, and so incobotulinumtoxinA is not counted in our analysis as a new NME. A similar issue occurred last year.

Another interesting case for a new 2010 biological drug is Collagenase Clostridium histolyticum (approved in the US in 2010 as Xiaflex), which is a defined composition mixture of two bacterial collagenase gene products. Xiaflex is dosed parenterally. In 2004 Santyl was approved as a topical drug for wound debridement; the active ingredient in Santyl is ‘Collagenase clostridium histolyticum’, produced by an entirely different process. It would appear from cursory literature analysis that Santyl has non-articulated composition (this is not the same as having a variable or non-specific composition, just that the components are not in a defined composition in the easily accessible public regulatory documents). There are clear developmental and safety differences between a topically dosed ‘local’ agent (Santyl), and an agent that has full exposure to the circulatory and immune system (Xiaflex), and they serve different patient populations, have different indications, etc. They are clearly non-substitutable in a clinical setting.

So, how does one treat this case? Should Xiaflex be considered as actually two new NMEs (the independent and related products of the related ColG and ColH gene products, which is actually what the USAN references) towards drug approval innovation numbers, or should it be subsumed under the previous approval of Collagenase Clostridium histolyticum for Santyl. We have taken the view, from the perspective of the approval of ‘new NMEs’, that Xiaflex contains a previously approved active ingredient. Others will take different views.

More broadly, it is of interest to examine the USAN definition for Xiaflex - it contains two distinct chemical components (the two sequence related collagenase proteins) in a simple mixture - there is nothing special about the mixture - for example, they are not a defined composition obligate heterodimer, and they will be separable from drug substance via straightforward routes under native physiological-like conditions. Some small molecule USANs contain multiple molecules, but these are invariably salts, and in cases where there are two (or more) active ingredients in a small molecule drug, they are typically assigned separate USANs. Furthermore, the convention now is to assign a USAN for the parent small molecule, as well as for each distinct salt, even if the salt is the only component in an approved product. This is in-line with the INN model (where salts are not usually assigned distinct INNs) Logically, to us, from an informatics perspective, it would make sense to assign USANs for Xiaflex at the level of the distinct proteins), and then for Xiaflex to be a ‘product’ containing two USANs as a defined mixture, in the same way the many small molecule mixture drugs are defined. Anyway, the informatics representation of biological drugs, and the concepts of bioequivalence, differences in post-translational processing (proteolytic maturation, N- and O-linked glycosylation, etc) may seem to be a semantic discussion, but it does have important commercial and healthcare implications. This issue will no doubt keep many drug discoverers, regulators, and intellectual property staff employed for some time, and hopefully will eventually bring improved, cheaper and continually innovative healthcare to all.

Stepping back even further… Given that current drug naming processes and ‘business rules’ were developed at a time when the complexities of biological drugs were not imagined, and also before a time of electronic databases, and the benefits of the application of controlled vocabularies, dictionaries and ontologies were really appreciated - it is interesting to reflect on how it would be done nowadays if starting from scratch. More of this in a future post (maybe).

In final summary, the number of molecularly novel drugs that were approved in the US last year is between 19 and 22, with the difference being in the way that biological drugs are treated!

Tuesday, 21 December 2010

A local meeting for local people


Andreas Bender from the Unilever Center in Cambridge has set up a series of meetings for networking of molecular modellers, chemoinformaticians and allied trades  for the Cambridge, UK, area. The meetings alternate between the University in Cambridge and the EBI, these have been great fun so far, and there is now a web-site for the Cambridge Cheminformatic Network, with a Doodle Poll for dates for the 2011 meetings.

Monday, 20 December 2010

Thursday, 16 December 2010

Links to ChEMBL data from Wikipedia


Everyone's favorite free encyclopedia - wikipedia, has started to get links to ChEMBL added. The links are all sorted out, now we just need a bot to go round and sort everything out for a large set of compounds. So, to give you some idea of how it will look, here are some links to some classics

Wednesday, 15 December 2010

ChEMBL now has a Japanese language Wikipedia page



ChEMBLケンブル) is now on Japanese Wikipedia.


Note, the page is currently flagged as spam, so hopefully this will change when it is reviewed....

Tuesday, 14 December 2010

ChEMBL Bioactivity Course February 2011 - Registration Open



Registration for the 2011 residential ChEMBL Training course, running from the 14th to the 18th of Feburary 2011, is now underway, further details can be found at this link.

What better way to spend Valentine's Day?

Monday, 13 December 2010

ARMC Vol 45 is out!


One of the highlights of the year for me is the gentle thud of ARMC (Annual Reviews in Medicinal Chemistry) landing in my pigeon hole at work. This book alone, I consider, justifies my ACS membership dues. Volume 45 is edited by John Macor, and has some excellent chapters, highlights for me were the TLR review, the epigenetic targets, the neglected tropical diseases and the drug attrition chapters. Lots and lots to keep you occupied on your commute (unless you drive).

%J Annu Rev. Med. Chem.
%V 45
%E J.E. Macor
%I Academic Press
%O ISBN 978-0-12-380902-5
%O ISSN 0065-7743

Monday, 6 December 2010

A conference goers bag of drugs?



I was musing on the drugs I need to take with me while travelling. Here is the first go at a list. This trip, I packed only one item.....

  • DEET (for pesky insects that love your blood even more you do - thanks Dr. Congreve!).
  • Aspirin (for the inevitable headache, fever and as a bonus an anticoagulant for flights, especially in economy seats - one of Nature's true gifts - aspirin that is, not economy class seats).
  • Paracetamol (headache from sporadic/atypical alcohol consumption).
  • Loperamide (disturbed digestive habits, fluid loss).
  • Azithromycin (just in case of serious life threatening infections, often difficult to get abroad, use responsibly - thanks Dr. Schoichet!).
  • Loratidine (anti-histamine, insect bites, bed-bugs, allergies, etc).
  • Diphenhydramine (a sedative anti-histamine for disturbed sleep patterns, or to block out the presence of your annoying fidgety neighbour on the plane).
  • Ranitidine (for inevitable over-indulgence and subsequent disturbed digestion).
  • Neomycin Sulfate, Polymyxin B, Bacitracin Zinc, Pramoxine (for itches, cuts, grazes and other stuff).
  • A quality sunblock (so obvious, it goes without saying).
Of course, you should use common sense and judgement in such matters, and also take the advice of your personal doctor or physician ;)

Thursday, 2 December 2010

Update of Chemistry Follow-up on GSK Malaria HTS set



Here is an update on some follow-on chemistry on the GSK Malaria screening set, provided by the researchers at GSK's Tres Cantos Medicines Development Campus.

Chemistry in progress

TCMDC-123822, TCMDC-123823, TCMDC-123824, TCMDC-123827, TCMDC-134513, TCMDC-123579, TCMDC-123582, TCMDC-125454, TCMDC-134141, TCMDC-134142, TCMDC-134143, TCMDC-134692, TCMDC-135254, TCMDC-135271, TCMDC-135426, TCMDC-135461, TCMDC-135462, TCMDC-135463, TCMDC-135554, TCMDC-135654, TCMDC-135655, TCMDC-135656, TCMDC-135657, TCMDC-135677, TCMDC-135687, TCMDC-135789, TCMDC-135796, TCMDC-135816, TCMDC-135911, TCMDC-136013, TCMDC-136014, TCMDC-136015, TCMDC-136016, TCMDC-136051, TCMDC-136060, TCMDC-136134, TCMDC-136185, TCMDC-136188, TCMDC-136303, TCMDC-139046

Chemistry on hold

TCMDC-123540, TCMDC-123620, TCMDC-123685, TCMDC-123755, TCMDC-123796, TCMDC-123825, TCMDC-124434, TCMDC-124501, TCMDC-125331, TCMDC-125334, TCMDC-125650, TCMDC-134557, TCMDC-134672, TCMDC-134674, TCMDC-134675, TCMDC-135196, TCMDC-135265, TCMDC-135371, TCMDC-135510, TCMDC-135533, TCMDC-135572, TCMDC-135650, TCMDC-135652, TCMDC-135659, TCMDC-135672, TCMDC-135684, TCMDC-135696, TCMDC-135772, TCMDC-135797, TCMDC-135803, TCMDC-135804, TCMDC-136325, TCMDC-136326, TCMDC-136328, TCMDC-136760, TCMDC-136761, TCMDC-136807

Chemistry abandoned

TCMDC-123822, TCMDC-123823, TCMDC-123824, TCMDC-123827, TCMDC-134513

We'll put this data into the ChEMBL-NTD microsite shortly, but if any other groups have updates on the Malaria screening data disclosures, or various analyses that we could point to, we'd be very happy to hear.

If you want to link directly to a particular TCMDC number you can use a url of the form...

https://www.ebi.ac.uk/chemblntd/system_search?qString=TCMDC-136165

Tuesday, 30 November 2010

canSAR v1.0 launched

canSAR logo Yesterday, Cancer Research UK press released the launch of the first full version of canSAR - the Institute of Cancer Research's integrated cancer research and drug discovery resource. canSAR integrates large volumes of disparate data covering most aspects of cancer biology and chemistry, and is an example of how to complement the chEMBL database with therapeutic area specific knowledge. canSAR integrates biological annotation, gene expression, RNA interference studies, structural biology and protein interaction network data - as well as chemical and pharmacological data. It contains annotation on the entire human proteome, and contains >8 million experimental data points including RNAi and chemical screening data. For full release notes please see canSAR news. canSAR is updated monthly. As well as the wonderful chEMBL, the data in canSAR comes from a large number of sources, including ArrayExpress, PDBe, ROCK, STRING, Genomics of Drug Sensitivity in Cancer, COSMIC, BindingDB, SCOP, PFAM- we (at the ICR) are grateful to our friends at all these places for their help. In the new year, we will be holding a series of webinars and walkthroughs, and details of these will be posted on the ChEMBL-og.

Sunday, 28 November 2010

Any old unwanted SGI dial boxes?


I am doing a surprising amount of molecular graphics stuff at the moment, and have finally realised I don't have the skills or coordination to use a mouse/keyboard to do simple rotations/scaling/clipping etc. So, it turns out it's possible to connect up a dial box to a aMacBook Pro, with a little bit of messing around with drivers. So does anyone have an old, unwanted dial box for an SGI machine? Part numbers 9980991, 9980992, and 9780804 are all apparently OK. If you have one of these, let me know.

Monday, 22 November 2010

2010 New Drug Approvals - Pt. XVIII - Tesamorelin (Egrifta)


ATC code (partial): H01AC

Also this month, on November 10th, FDA has approved Tesamorelin under the trade name Egrifta. Tesamorelin (research code:TH-9507) is an analog of the human growth hormone-releasing factor (GRF) (UniProt:P01286, synonym:Somatoliberin, synonym:GRF, synonym:GHRH) indicated for the reduction of excess abdominal fat in HIV-infected patients with lipodystrophy. Lipodystrophy is a condition in which excess fat develops in atypical areas of the body, most notably around the liver, stomach, and other abdominal organs. This condition is observed as a side effect with many antiretroviral drugs used to treat HIV. Tesamorelin is the first-FDA approved treatment specifically approved for lipodystropy.

The -relin INN stem covers prehormones or hormone releasing peptides, a very broad range of targets and pharmacology. The -morelin stem sub-group covers growth hormone-release stimulating peptides including capromorelin, dumorelin, examorelin, ipamorelin, pralmorelin, rismorelin, sermorelin, somatorelin, and tabimorelin.


Tesamorelin is an N-terminally modified variant of the natural 44 residue long peptide, Somatoliberin, which is a hypothalamic peptide, acting on the pituitary somatotroph cells to stimulate the synthesis and pulsatile release of endogenous growth hormone (GH), which is both anabolic and lipolytic. Somatoliberin is a member of the glucagon family (Pfam:PF00123) of endogenous peptide ligands. Tesamorelin exerts its therapeutic effects by binding to, and being an agonist of GHRHr - a type-2 (or class B, or secretin-like) GPCR (Uniprot: Q02643, ChEMBL:CHEMBL158, Pfam:PF00002), on pituitary somatotrophs; the triggered release growth hormone (GH) in turn acts on a variety of target cells, including chondrocytes, osteoblasts, myocytes, hepatocytes and adipocytes, resulting in a host of pharmacodynamic effects, which are primarily mediated by insulin-like growth factor 1 (IGF-1) produced in the liver and in peripheral tissues.

Tesamorelin has Molecular Weight of 5135.9 Da, absolute bioavailability, following s.c. dosing is less than 4%, with a volume of distribution of 10.5 L/kg (in HIV-infected patients) and an elimination half-life (t1/2) of 38 minutes (again in HIV-infected patients). The recommended dosage is 2 mg injected subcutaneously daily (typically in the abdomen) - a typical daily dose is therefore 0.39 umol).



trans-3-hexenoyl-YADAIFTNSYRKVLGQLSARKLLQDIMSRQQGESNQERGARARL-NH2

Tesamorelin is produced synthetically and is otherwise identical in amino acid sequence to that of human Somatoliberin/GRF. Tesamorelin is then modified by attachment a 3-hexenoyl moiety via an amide linkage to the N-terminal tyrosine residue. This chemical modification blocks proteolytic degradation by endogenous proteins such as DPP-IV, thus prolonging the half-life of the peptide (the inhibition of DPP-IV is itself the basis of a number of therapies for the treatment of type-II diabetes - the gliptins). A further chemical modification is the C-terminal amidation - this p.t.m. is found in the naturally produced peptide. Tesamorelin is closely chemically related to a number of other clinical agents, such as Sermorelin (which is a shorter, but still active version of Somatoliberin/GRF)

The full prescribing information can be found here.

The license holder is EMD Serono, Inc. and the product website is www.egrifta.com.

Domain-level annotation of binding-sites for ligands within ChEMBL

One big problem of simple sequence searching with tools like blast with ChEMBL is the problem of the introduction of contextually incorrect target relationships due to matching of irrelevant domains. For example, imagine a protein, X, that contains two domains of types A and B, and a second protein, Y, which contains also two domain types, B and C. If the ligand is known to bind to domain type A, there is no ligand-binding relationship between X and Y; however, if the ligand binds at domain type B in X, then there is a relevant relationship between X and Y. This may sound like an rare example, but it is surprisingly common (and extremely annoying), since the majority of eukaryotic proteins are multi-domain, and the presence of certain domains, such as an EGF-like domain (Pfam:pf00009) can greatly complicate the analysis of sequence searches. What is really needed is a reliable mapping (or more generally a probabilistic score) of the ligand-binding domains within a particular protein.

Enough of all these Xs and Ys! Here is a real example, for three interesting proteins, Axl, Lck and SOCS3. As you can see, protein kinase domain inhibitors are only 'transferable' between Axl and Lck, while SH2 binders are only 'transferable' between Lck and SOCS3.




Here is a graph (as a pie chart) of the Pfam domains for the ligand binding regions of all the protein targets in the current (Chembl_08) target dictionary. The annotation, was performed by a simple classifier heuristic, and we are validating the accuracy of this approach at the moment, but it appears to be largely correct. Once we're happy with the results, we'll add the ligand-binding-domain data to the target dictionary.

Sunday, 21 November 2010

Do you want to know about the Chembl Database User Group meeting?

The ChEMBL Database User Group on Linkedin is perilously close to 100 members - in fact we need just one more to make it to the magic century! We have found a well known industry figure to help organise our first User Group meeting, and we'll start posting details shortly on the LinkedIn group site.

What Are The Key Clinical Candidate Disclosure Meetings?


Here is a call for assistance, all input will end up published on The ChEMBL-og, and accessible to one-and-all. What we're looking for is a pretty comprehensive list of key clinical candidate disclosure meetings, ideally those with disclosure of chemical structure, functional assay, pharmacokinetic and toxicology data - you know the sort of meeting, where the key data on a hot compound is disclosed for the first time.

I've put together a preliminary list here, from memory and a little bit of googling - this is far, far, far from perfect and is woefully incomplete, and am now looking for addition of extra meetings for the areas not covered, and some highlighting of additional ones. As you will see I've used the ATC classification for the structure of the list - although this is not perfect for things like anti-microbials, etc., it is actually a pretty good framework to hang this off.

If you have any suggestions, please mail them in....

Finally, if you are interested in hearing our plans, and maybe collaborating on some informatics aspect of this, feel free to contact us.

Friday, 19 November 2010

2010 New Drug Approvals - Pt. XVII- Eribulin Mesylate (Halaven)





ATC code (partial): L01C

On November 15th, 2010, the FDA approved Eribulin Mesylate (ResearchCode:E-7389) under the trade name Halaven (TradeMark:Halaven). It is indicated for for the treatment of patients with late stage, metastatic breast cancer who have previously received at least two chemotherapeutic regimens for the treatment of metastatic disease. Phase III trials showed that patients survived a median of 2.5 months longer than patients treated with other current alternatives. Eribuln is a synthethic analogue of halichondrin B, a cytotoxic polyether macrolide marine natural product.

The mechanism of action of Eribulin is anti-mitotic and is mediated via tubulin binding, where it leads to G2/M block in the the cell-cycle; after prolonged stalling in this state, cells enter apoptosis and are then cleared.


Eribulin is a large (Mwt 729.9 for Eribulin and 826.0 for the mesylate salt) synthetic compound (an analogue of halichondrin B) an IUPAC name of the structure is 11,15:18,21:24,28­ Triepoxy-7,9-ethano-12,15-methano-9H,15H-furo[3,2-i]furo[2',3':5,6]pyrano[4,3­ b][1,4]dioxacyclopentacosin-5(4H)-one, 2-[(2S)-3-amino-2-hydroxypropyl]hexacosahydro-3­ methoxy-26-methyl-20,27-bis(methylene)-, (2R,3R,3aS,7R,8aS,9S,10aR,11S,12R,13aR,13bS,15S,18S,21S,24S,26R,28R,29aS)-, methanesulfonate (salt). The most striking part of the structure is the highly fused, rigid ring system, as you would expect, the synthesis is complicated. The structure contains many of the classical features of natural products - a high number and fraction of defined chiral centers, a high ratio of oxygens to nitrogens, and a high ring count.

The recommended dosing is 1.4mg/m2 as two intravenously delivered doses, separated by seven days, repeated after a further two weeks. An average adult human has a skin surface area of ca. 1.8 m2, so this would equate to a single dose of ~3 umol)  The mean half-life of Eribulin is ~40 hr, with a mean volume of distribution of ~80 L/m2, and a mean clearance of ~1.8 L/hr/m2. Plasma protein binding is around 58%. Eribulin is metabolically stable and is largely unmetabolised, with the majority of the dosed drug being excreted as the dosed form in the feces.

Eribulin binds at (or near) the vinca domain of tubulin, a region that is located at the interface of two tubulin heterodimers when arranged end to end and overlaps the exchangeable GTP site on β-tubulin (Bai et al). β-tubulin is small family of related human proteins (PFAM:PF03953, HOMSTRAD:tubulin, and UniProt:P07437 for a specific member) that are key components of microtubules. There are multiple isoforms of β-tubulin e.g. "tubulin-beta1" , ChEMBLDB ID: CHEMBL1915, canSAR:link; and "tubulin-beta5", ChEMBLDB_ID:CHEMBL5444, canSAR link. Multiple 3-D structures are available for alpha-/beta-tubulins including PDBe:1tub. Tubulins are the target of several other classes of anticancer drugs, such as Paclitaxel (aka taxol) and Vinblastine (both similarly cytotoxic natural products)

 
NAME="Eribulin Mesylate"
TRADEMARK_NAME="Halaven"
ATC_code= L01C
SMILES="CO[C@@H]([C@@H](C[C@H](O)CN)O1)[C@@H](CC(C[C@@H]2O[C@@]([C@H]3C4[C@@]([C@@H]5[C@@H](C6)O4)([H])O7)([H])[C@]7([H])CC2)=O)[C@@H]1C[C@@H](O[C@@H](CC[C@H]8C(C[C@H](CC[C@]6(O5)O3)O8)=C)C[C@H]9C)C9=C.CS(O)(=O)=O"
InChI="/C40H59NO11.CH4O3S/c1-19-11-24-5-7-28-20(2)12-26(45-28)9-10-40-17-33-36(51-40)37-38(50-33)39(52-40)35-29(49-37)8-6-25(47-35)13-22(42)14-27-31(16-30(46-24)21(19)3)48-32(34(27)44-4)15-23(43)18-41;1-5(2,3)4/h19,23-39,43H,2-3,5-18,41H2,1,4H3;1H3,(H,2,3,4)/t19-,23+,24+,25-,26+,27+,28+,29+,30-,31+,32-,33-,34-,35+,36+,37+,38?,39+,40+;/m1./s1/i1-12,2-12,3-12,4-12,5-12,6-12,7-12,8-12,9-12,10-12,11-12,12-12,13-12,14-12,15-12,16-12,17-12,18-12,19-12,20-12,21-12,22-12,23-12,24-12,25-12,26-12,27-12,28-12,29-12,30-12,31-12,32-12,33-12,34-12,35-12,36-12,37-12,38-12,39-12,40-12,41-14,42-16,43-16,44-16,45-16,46-16,47-16,48-16,49-16,50-16,51-16,52-16;1-12,2-16,3-16,4-16,5-32"
ChemDraw=eribulin.cdx

Full prescribing information here The license holder for Halaven™ is Eisai Inc.

Monday, 15 November 2010

ChEMBL_08 Released

We are pleased to announce the release of chembl_08. This version of the ChEMBL database was prepared 26th October 2010 and contains:
  • 735393 compound records
  • 636269 compounds (of which 635933 have molfiles)
  • 488898 assays
  • 2973034 activities
  • 8088 targets
  • 38462 publications
  • 5 activity data sources
You can also download the ChEMBL database (Oracle 9i, 10g, 11g or MySQL) from our ftp site: ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/. Changes to the database (please see release notes for more detail):
  1. FDA approved drugs have now been added to the compounds table*. Some drugs (e.g., biotherapeutics) do not have a structure/molfile, and not all drugs have bioactivity data associated with them. Further information for these drugs (e.g., mechanism of action) will be added in subsequent releases.
  2. Parent compounds have been generated by removing the salt component from any compounds tested as a salt form. Both the parents and the salt forms are recorded in the compounds table and a new table: molecule_hierarchy shows the relationship between them.
  3. ChEMBL identifiers (chembl_id) have been added to the compounds, target_dictionary, assays and docs tables. These take the form 'CHEMBL' followed immediately by an integer (e.g., CHEMBL941) and are used on the interface. Small molecules within the database will still have a ChEBI ID, and protein targets a UniProt accession, in addition.
You can access the data via the ChEMBL database interface: http://www.ebi.ac.uk/chembldb/index.php. Changes to the interface:
  1. The interface now uses chembl_id for compounds, assays and targets. Old URLs (e.g., using chebi_id/assay_id/tid) will continue to work, however we recommend using the chembl_id when linking to the ChEMBL interface.
  2. The compound, target and assay report card pages now include interactive pie charts to allow users to link to related data sets in the ChEMBL database e.g. https://www.ebi.ac.uk/chembldb/index.php/compound/inspect/CHEMBL41
  3. Compound report card page has been updated to include the drug icons, for FDA approved molecules* in ChEMBL e.g. https://www.ebi.ac.uk/chembldb/index.php/compound/inspect/CHEMBL941
The ChEMBL Team *The identification and loading of the FDA approved compounds in the ChEMBL database is part of a larger process of integrating drug and clinical candidate information into the ChEMBL database. This process has not not been completed, so please expect enhancements to the underlying schema and interface in future releases of the ChEMBL database.

Saturday, 13 November 2010

SMR Meeting - Trends in Medicinal Chemistry - 9th December 2010


The next Society for Medicines Research meeting is on the 9th December 2010 at the National Heart and Lung Institute, Kensington, London. These are my favourite day meetings, cheap, well organised and very applied to actual drug discovery. This meeting there are some great talks.

Unfortunately, I cannot go to the SMR meeting - I will be ill (probably), in a hotel (hopefully), maybe on the beach (certainly) at the Zing Structural Biology Conference. The graph above answered an important question for me.

Thursday, 11 November 2010

2010 New Drug Approvals - Pt. XVI - Ceftaroline Fosamil (Teflaro)



ATC code (partial): J01DI

On October 29th, FDA has approved Ceftaroline Fosamil under the trade name Teflaro. Ceftaroline Fosamil (previously known by the research code TAK-599, the parent drug, Ceftaroline is also known as T-91,825) is an antibiotic indicated for the treatment of adults with acute bacterial skin and skin structure infections (ABSSSI) caused by susceptible Gram-positive and Gram-negative microorganisms, such as Staphylococcus aureus (including methicillin-susceptible and -resistant isolates), Streptococcus pyogenes, Streptococcus agalactiae, Escherichia coli, Klebsiella pneumoniae, and Klebsiella oxytoca, and also for the treatment of community-acquired bacterial pneumonia (CABP) caused by susceptible Gram-positive and Gram-negative bacteria, such as Streptococcus pneumoniae (including cases with concurrent bacteremia), Staphylococcus aureus (methicillin-susceptible isolates only), Haemophilus influenzae, Klebsiella pneumoniae, Klebsiella oxytoca, and Escherichia coli.

Ceftaroline Fosamil is a semisynthetic antibacterial of the cephalosporin class of beta-lactams, which are originally identified in 1948 from the Cephalosporum/Acremonium. Ceftaroline Fosamil is the phosphamide prodrug of the bioactive Ceftaroline. Like other drugs in the same class, the bactericidal action of Ceftaroline is mediated through covalent binding to essential penicillin-binding proteins (PBPs) in the bacteria wall. In particular, ceftaroline is bactericidal against S. aureus, including methicillin-resistant S. aureus (MRSA), due to its affinity for PBP2a (Uniprot: Q53707, ChEMBL: 19669), the type of PBP produced by MRSA and not well inhibited by other antibiotics such as methicillin (ChEMBL: 116716), oxacillin (ChEMBL: 156432), penicillin, and amoxicillin (ChEMBL: 657723). Ceftaroline is also active against S. pneumoniae due to its affinity for PBP2x (Uniprot: P14677, ChEMBL: 102467).

Ceftaroline Fosamil is a large 'small-molecule' semisynthetic prodrug (Molecular Weight of 685.7 g.mol-1 for Ceftaroline Fosamil itself and 762.7 g.mol-1 for the monoacetate salt), slightly lipophilic and soluble in water. Following injection, Ceftaroline Fosamil has a volume of distribution of 20.3L, a low plasma protein binding (ppb) of 20%, an elimination half-life of 1.6hr and a plasma clearance of 9.58 L/hr. Ceftaroline Fosamil is primarily eliminated by the kidneys (88% of the dose is recovered in urine) and mainly as the active metabolite ceftaroline (64% as ceftaroline and 2% as an inactive metabolite). Ceftaroline is not an inhibitor or substrate of the major cytochrome P450 isoenzymes. The recommended dosage of Ceftaroline Fosamil is 600mg every 12 hours by intravenous infusion administrated over an hour.

The full prescribing information can be found here. Like other cephalosporins, Ceftaroline Fosamil structure (6R,7R)-7-{(2Z)-2-(ethoxyimino)-2-[5-(phosphonoamino)-1,2,4thiadiazol-3-yl]acetamido}-3-{[4-(1-methylpyridin-1-ium-4-yl)-1,3-thiazol-2-yl]sulfanyl}-8-oxo-5-thia-1azabicyclo[4.2.0]oct-2-ene-2-carboxylate contains a cyclic amide (the beta-lactam ring) fused with a six member ring (the cephem ring). Another notable feature of Ceftaroline Fosamil is the thiazolylthio group, which is thought to be crucial for the activity against MRSA.

NAME="Ceftaroline Fosamil"
TRADEMARK_NAME="Teflaro"
ATC_code= NA
SMILES="CCO\N=C(/C(=O)N[C@H]1[C@H]2SCC(=C(N2C1=O)C(=O)O)Sc3nc(cs3)c4cc[n+](C)cc4)\c5nsc(NP(=O)(O)O)n5"
InChI="InChI=1S/C22H21N8O8PS4/c1-3-38-26-13(16-25-21(43-28-16)27-39(35,36)37)17(31)24-14-18(32)30-15(20(33)34)12(9-40-19(14)30)42-22-23-11(8-41-22)10-4-6-29(2)7-5-10/h4-8,14,19H,3,9H2,1-2H3,(H4-,24,25,27,28,31,33,34,35,36,37)/p+1/b26-13-/t14-,19-/m1/s1"
ChemDraw=Ceftaroline_Fosamil.cdx

The license holder is Forest Pharmaceuticals, Inc. and the product website is www.teflaro.com.

Small Molecules Bioactivity Course - February 2011


Registration for the 2011 residential ChEMBL Training course, running from the 14th to the 18th of Feburary, is now underway, further details can be found at this link.

What better way to spend Valentine's Day?

The picture above is of the ingredients in a Twinkie, apparently.

Wednesday, 10 November 2010

Staff Position in ChEMBL - EU-OPENSCREEN database developer


The EBI recruitment website now has the EU-OPENSCREEN developer position detailed. Closing date is the 12th December 2010.

The job is an exciting opportunity to work on establishing a pan European archive of academic screening data.