Monday, 14 May 2012

ChEMBL Webinar 16th May 'Schema & SQL Querying' - Posted by Louisa




This is a last call for people wanting to sign up for the "Schema & SQL Querying" webinar that will be hosted this Wednesday 16th May at 3.30pm (BST).

It will be a 45 minute webinar that will take you through the ChEMBL schema and also how to use SQL queries to extract data from the database.

Remember to register your interest in our webinars on the Doodle Poll. Make sure that you leave your **email address** as well as your name so that we can send the connection details to you. Any problems, please contact chembl-help@ebi.ac.uk.

For those of you who can't make it to this webinar, we will be hosting it again on the 27th June.

Thursday, 10 May 2012

USAN Watch - May 2012

The USANs for May 2012 have just been published. 


Update: It looks like there is now a publication of the list, then a few more added. I've captured these, as I note them, but it's not ideal.....


USAN Research Code StructureDrug ClassTherapeutic classTarget
asparaginase Erwinia chrysanthemicrisantapase (INN)enzymetherapeuticn/a
cindunistat hydrochloride maleatePHA-728669Fsynthetic small moleculetherapeuticNOS?
enzalutamideMDV-3100synthetic small moleculetherapeuticAR
flutemetamol F18[18F]AH-110690   synthetic small moleculeimaging agent
sodium glycerophosphate

natural product derived small moleculesupplementn/a
lasmiditan, lasmiditan succinateLY-683974, COL-144synthetic small moleculetherapeutic5HT1F
lifitigrast, lifitigrast sodiumSAR-1118-023

synthetic small moleculetherapeuticLFA-1 integrin
neceprevir sodiumACH-0142684.Na, ACH-2684.Na
synthetic small molecule
therapeuticHCV NS3
nintedanib, nintedanib esylateBIBF-1120synthetic small moleculetherapeuticFGFR, PDGFR, VEGFR
serelaxinRLX-030synthetic small moleculeitherapeuticRXFP1, RXFP2
trenonacog alfaFactor IXenzymetherapeuticn/a
vercirnon sodiumGSK-1605786A   synthetic small moleculetherapeuticCCR9

Monday, 7 May 2012

New Drug Approvals 2012 - Pt. XI - Taliglucerase alfa (ElelysoTM)






ATC code: A16AB11
Wikipedia: Taliglucerase alfa
 

On May 1, the FDA approved taliglucerase alfa for the treatment of Type I Gaucher's disease. Gaucher's disease is the most common of the lysosomal storage diseases. It is a hereditary disease caused by a deficiency of the enzyme β-glucocerebrosidase (Uniprot: P04062), also called β-Glucosidase. Gaucher's disease is a rare genetic disease with an incidence of 1 in 50,000 births and is considered an orphan disease. Type I Gaucher's disease is about 100 times more common in people of Ashkenazi jewish descent compared a north American population. Symptoms of type I Gaucher's disease begin typically in early adulthood and include enlarged liver and grossly enlarged spleen, impaired bone structure, anemia and low platelet levels, leading to prolonged bleeding and easy bruising. If enzyme replacement therapy (ERT) is available, the prognosis for patients with type I Gaucher's disease is good.

β-Glucocerebrosidase is an enzyme of 536 amino acids and molecular weight of approximately 59.7 kDa. The gene for β-glucocerebrosidase is located on the first chromosome (1q21) and catalyzes the hydrolyzation of glucocerebrosides (eg. ChEBI:18368), a process required for the turnover of the cellular membranes of red and white blood cells.  Macrophages clearing these cells fail to metabolize the lipids, accumulating them instead in their lysosomes.  Thus, macrophages turn into dysfunctional Gaucher cells and abnormally secrete inflammatory signals. The deficiency of glucocerebrosidase in Type I Gaucher's disease is only partial and in most cases caused by a mutation  replacing asparagine with serine in the 370th residue of the protein sequence. The deficiency of the mutant enzyme can be compensated by injection of an exogenous replacement and drastically improve the prognosis for patients with type I Gaucher disease. Prior to the approval of taliglucerase alfa, imiglucerase and velaglucerase alfa were already available ERTs for type I Gaucher's disease. The graphic below illustrates the reaction catalyzed by β-glucocerebrosidase and ERTs. The enzyme classification code for β-glucocerebrosidase is 3.2.1.45.



 Taliglucerase alfa is a monomeric glycoprotein containing 4 N- linked glycosylation sites and has a molecular weight of 60,8 kDa. The recombinant enzyme differs from native human glucocerebrosidase by two amino acids at the N terminal and up to 7 amino acids at the C terminal. Taliglucerase alfa is decorated with mannose-terminated oligosaccharide chains that are specifically recognized by macrophage receptors and assist in 'homing' the enzyme to its target cells.

Taliglucerase alfa is the first ERT expressed in plant cells (carrot root cells), not mammalian cells. Cultures of plant cells are more cost-effective for the expression of recombinant enzymes. 


Crystal structure of the human glucocerebrosidase (PDBe 1ogs).


The recommended dose is 60 Units/kg of body weight administered once every 2 weeks as a 60-120 minute intravenous infusion. A Unit is the amount of enzyme that catalyzes the hydrolysis of 1 micromole of the synthetic substrate para-nitrophenyl-β-D-glucopyranoside (pNP-Glc) per minute at 37°C. Adverse effects include pharyngitis, headache, arthralgia, flu and back pain.

Taliglucerase alfa is marketed by Pfizer and Protalix under the brand name Elelyso


The full prescribing information can be found here.


Sunday, 6 May 2012

Are there around 1019 Lipinski-like small molecules?


I'm a big fan of the work of Jean-Louis Reymond at the University of Berne, and am starting to imagine a time when the enormity of chemical space can be reasonably comprehensively mapped and explored, at least for 'fragment-sized' molecules. In the field of bioinformatics, the number of possible peptides is considered quite large - for example, for a peptide composed from the 20 natural peptides, there are 2010 possible distinct decapeptides (this is 10240000000000, or 1.024 x 1013 which is a big number of course, but not that big, and a decapeptide will have an average molecular weight of about 1,100 Da. For a 500ish molecular weight natural peptide there are only 3.2 million possibilities. However, small molecules comprehensively trash these 'biologically constrained' numbers, making cheminformatics I think a great frontier and challenge for HPC and "large data".

The GDB databases give some idea of the size of drug-like chemical space. If you take the current GDB databases, and plot the size of the library as a function of the number of heavy atoms...

...you get a classic log plot, essentially the largest library is so much bigger than the smaller sets that it dominates the number of compounds in the library. So on a linear scale plot it looks like this, 

but on a log scale, its approximately linear, and a regression can be readily established against this.


So, for the GDB containing 33 heavy atoms (which at an average heavy atom mass of 15 Da, corresponds to a molecular weight of around 500), gives about 1019 to 1020 distinct molecules. Of course, there are a bunch of assumptions behind the GDB enumeration approach (limited elements, but sensibly limited, the fraction of Lipinski compliant molecules within that set is an open question, but even if only 1% are, then it doesn't affect this number too much. 

1019 is too big to even think about storing - as SMILES it is a zettabyte scale storage problem alone, but smart subset sampling, and the ever growing advances in data compression, processor power and connectivity, will no doubt start to chip away at this challenge of chemical comprehensiveness. 

As an aside - a google search shows that one of the largest storage arrays in the world at the moment is a 150 petabyte system at IBM Alamaden - so 1 zettabyte is about 7,000 times the size of this.

Thursday, 3 May 2012

PPI Library - Part 3



It turns out that scientists and the rest of the world interpret 'PPIs' as very different acronyms - as the amount of spam comment filtering for Payment Protection Insurance I’ve had to delete shows. Anyway, life got in the way of science for a few weeks for me ( :( ), but some more of the PPI work is described here. 

A very simple algorithm was applied to build a library of experimental peptide conformers.  Firstly every tetra-peptide from a protein structure was extracted; one of these peptides was then taken as a seed for a conformational cluster, and subsequent tetra-peptides were fitted to this fragment. If the RMSD for the main-chain atoms was lower that a cutoff parameter, the original 'seed' fragment was taken as representative of that cluster. If the RMSD was greater than the cutoff, then a new cluster was established, and any subsequent tetra-peptides were fitted to both cluster representatives, and so forth. As more unique peptide conformers are seen (defined according to the RSMD cutoff) the number of clusters increases. Of course, the population of each column is stored - some conformers are really common (alpha-helix and beta-strand fragments) and others are rare/experimental errors. 



At a large cutoff parameter, all tetra-peptides would cluster in the same set as the initial seed, and at a sufficiently small cutoff, then every tetra-peptide would be unique.

When applied to 2ptn (bovine trypsin, for deeply routed reasons my favorite PDB entry ever, and contain most features of globular proteins, secondary and super-secondary structure, turns, etc.) the following number of representative clusters were found, shown as a function of the RMSD cutoff. One way of thinking of this approach, is that the library can be though of containing every possible peptide conformation, at a given error/variation/resolution. So, it’s a sort of variable ‘resolution’ library. For 2ptn, you can see that the library complexity takes off below about 0.7 Angstrom RMSD. There is the asymptote at around 220, since this is about the number of residues in 2ptn. 







There are a few tricks that need handling in the code, primarily in the treatment of peptides that span chain breaks in the protein structure - for this analysis, the four residues needed to be covalently contiguous (i.e. No internal chain breaks).
So, we now have a way of building a representative library of peptide conformers that we can think about suing as scaffolds for mimicking in our PPI library (as well as the mainchain donor/acceptor positions, we also have the C-alpha to C-beta vectors).

The next step is to extend this approach to a larger, more representative library of protein structures, let's use a validated (but ancient) paper for this.

%A U. Hobohm
%A C. Sander
%T Enlarged representative set of protein structures
%J Protein Science
%V 3
%P 522-524
%D 1994

Trivia: The photo above is of one of my sons, on mayoral voting day 2012, in a very wet London. You are never too young to learn about politics!


Update: Sorry the figures got barfed by the blogger software with a bad url, and got lost, so I've replaced them.

Deadline Approaching for Computational Drug Discovery Course



The deadline of 7th May is quickly approaching to register for the course we are hosting, here in Hinxton - "Joint EMBL-EBI and Wellcome Trust Resources for Computational Drug Discovery". This joint EBI-Wellcome Trust course aims to provide the participants with the principles of chemical biology and how to use computational methods to probe, explore and modulate biological systems using chemical tools. The course will be comprised of a mixture of lectures and hands-on components. The conceptual framework will be covered, as well as direct practical experience of retrieving and analysing chemogenomics data. Participants will be able to do their own target analysis and identify appropriate chemical tools for probing biological systems of interest to them.

Check out more details on the link, above.

Wednesday, 2 May 2012

How far behind the patent literature is the primary literature?


I can't believe that people haven't looked at this before, and I should have looked on 'The Interwebs', so I'm not claiming this is world leading or anything; but here's a little bit of analysis on joining ChEMBL with some of the recently released patent data - addressing the question - 'how far does the published literature lag behind the patent literature?'. Basic workflow is to identify a set of compounds in chembl for which we have patent data for, then get dates for the patents, and for each molecule in both sets calculate the difference between the earliest literature date and the earliest patent date. This is what the distribution looks like....


So, on average it looks like the literature is about two to three years behind the patent literature, which is closer than I thought. The eagle-eyed will of course note that there are quite a few negative dates here - so a patent was filed containing the compound structure after a literature publication. A key point though is that there is no distinction between the compound being in the claims in the patent as opposed to just mentioned.

More to do on this, but it's an interesting start. If there's interest in exactly what was done, source of data, etc., I can go into that in the comments section....

Thanks to George for pulling together the data!

Tuesday, 1 May 2012

New Drug Approvals 2012 - Pt. X - Avanafil (StendraTM)







ATC code: G04BE (partial)
Wikipedia: Avanafil


On April 27th, the FDA approved Avanafil (tradename: Stendra; Research Code: TA-1790), a phosphodiesterase 5 (PDE5) inhibitor for the treatment of erectile dysfunction (ED). ED is a sexual dysfunction characterized by the inability to produce an erection of the penis. The physiologic mechanism of penile erection involves the release of nitric oxide in the corpus cavernosum during sexual stimulation, which in turn activates the enzyme guanylate cyclase, resulting in increased levels of cyclic guanosine monophosphate (cGMP). cGMP produces relaxation of smooth muscle tissues, which in the corpus cavernosum results in vasodilation and increased blood flow. Avanafil (PubChem: CID9869929, ChemSpider: 8045620) enhances the relaxant effects of cGMP by selectively inhibiting PDE5 (ChEMBL: CHEMBL1827; Uniprot: O76074), an enzyme responsible for the degradation of cGMP.

Other PDE5 inhibitors are already available on the market and these include Sildenafil (approved in 1998; tradename: Viagra, Revatio; ChEMBL: CHEMBL192), Tadalafil (approved in 2003; tradename: Cialis; ChEMBL: CHEMBL779) and Vardenafil (approved in 2003; tradename: Levitra; ChEMBL: CHEMBL1520). These other PDE5 inhibitors are also approved for the treatment of pulmonary arterial hypertension (PAH).

PDE5 is an 875 amino acid-long enzyme (EC=3.1.4.35), belonging to the cyclic nucleotide phosphodiesterase family (PFAM: PF00233).

>PDE5A_HUMAN cGMP-specific 3',5'-cyclic phosphodiesterase
MERAGPSFGQQRQQQQPQQQKQQQRDQDSVEAWLDDHWDFTFSYFVRKATREMVNAWFAE
RVHTIPVCKEGIRGHTESCSCPLQQSPRADNSAPGTPTRKISASEFDRPLRPIVVKDSEG
TVSFLSDSEKKEQMPLTPPRFDHDEGDQCSRLLELVKDISSHLDVTALCHKIFLHIHGLI
SADRYSLFLVCEDSSNDKFLISRLFDVAEGSTLEEVSNNCIRLEWNKGIVGHVAALGEPL
NIKDAYEDPRFNAEVDQITGYKTQSILCMPIKNHREEVVGVAQAINKKSGNGGTFTEKDE
KDFAAYLAFCGIVLHNAQLYETSLLENKRNQVLLDLASLIFEEQQSLEVILKKIAATIIS
FMQVQKCTIFIVDEDCSDSFSSVFHMECEELEKSSDTLTREHDANKINYMYAQYVKNTME
PLNIPDVSKDKRFPWTTENTGNVNQQCIRSLLCTPIKNGKKNKVIGVCQLVNKMEENTGK
VKPFNRNDEQFLEAFVIFCGLGIQNTQMYEAVERAMAKQMVTLEVLSYHASAAEEETREL
QSLAAAVVPSAQTLKITDFSFSDFELSDLETALCTIRMFTDLNLVQNFQMKHEVLCRWIL
SVKKNYRKNVAYHNWRHAFNTAQCMFAALKAGKIQNKLTDLEILALLIAALSHDLDHRGV
NNSYIQRSEHPLAQLYCHSIMEHHHFDQCLMILNSPGNQILSGLSIEEYKTTLKIIKQAI
LATDLALYIKRRGEFFELIRKNQFNLEDPHQKELFLAMLMTACDLSAITKPWPIQQRIAE
LVATEFFDQGDRERKELNIEPTDLMNREKKNKIPSMQVGFIDAICLQLYEALTHVSEDCF
PLLDGCRKNRQKWQALAEQQEKMLINGESGQAKRN

Several crystal structures of PDE5 are now available. The catalytic domain of human PDE5 complexed with sildenafil is shown below (PDBe:1tbf)





Preclinical studies have shown that Avanafil strongly inhibits PDE5 (half maximal inhibitory concentration = 5.2 nM) in a competitive manner and is 100-fold more potent for PDE5 than PDE6, which is found in the retina and is responsible for phototransduction. Also, Avanafil has shown higher selectivity (120-fold) against PDE6 than Sildenafil (16-fold) and Vardenafil (21-fold), and high selectivity (>10 000-fold) against PDE1 compared with Sildenafil (380-fold) and Vardenafil (1000-fold). 

Avanafil has also been reported to be a faster-acting drug than Sildenafil, with an onset of action as little as 15 minutes as opposed to 30 minutes for the other drugs.


Avanafil is a synthetic small molecule, with one chiral center. Avanafil has a molecular weight of 345.21 Da, an ALogP of 2.16, 3 hydrogen bond donors and 9 hydrogen bond acceptors and thus fully rule-of-five compliant. (IUPAC: 4-[(3-chloro-4-methoxyphenyl)methylamino]-2-[(2S)-2-(hydroxymethyl)-pyrrolidin-1-yl]-N-(pyrimidin-2-ylmethyl)pyrimidine-5-carboxamide; Canonical Smiles: COC1=C(C=C(C=C1)CNC2=NC(=NC=C2C(=O)NCC3=NC=CC=N3)N4CCC[C@H]4CO)Cl; InChI: InChI=1S/C23H26ClN7O3/c1-34-19-6-5-15(10-18(19)24)11-27-21-17(22(33)28-
13-20-25-7-3-8-26-20)12-29-23(30-21)31-9-2-4-16(31)14-32/h3,5-8,10,12,
16,32H,2,4,9,11,13-14H2,1H3,(H,28,33)(H,27,29,30)/t16-/m0/s1)

The recommended starting dose of Avanafil is 100 mg and should be taken orally as needed approximately 30 minutes before sexual activity. Depending on individual efficacy and tolerability, the dose can be varied to a maximum dose of 200 mg or decreased to 50 mg. The lowest dose that  provides efficacy should be used. The maximum recommended dosing frequency is once per day.

Avanafil is rapidly absorbed after oral administration, with a median Tmax of 30 to 45 minutes in the fasted state and 1.12 to 1.25 hours when taken with a high fat meal. Avanafil is approximately 99% bound to plasma proteins and has been found to not accumulate in plasma. It is predominantely cleared by hepatic metabolism, mainly by CYP3A4 enzyme and to a minor extent by CYP2c isoform. The plasma concentrations of the major metabolites, M4 and M16, are approximately 23% and 29% of that of the parent compound, respectively. The M4 metabolite accounts for approximately 4% of the pharmacologic activity of Avanafil, with an in vitro inhibitory potency for PDE5 of 18% of that of Avanafil. The M16 metabolite has been found inactive against PDE5. After oral administration, Avanafil is excreted as metabolites mainly in the feces (approximately 62% of administrated dose) and to a lesser extent in the urine (approximately 21% of the administrated dose). Avanafil has a terminal elimination  half-life (t1/2) of approximately 5 hours, which is comparable to that of Sildenafil (3-4h) and Vardenafil (4-5h), but very short relative to the very long half-life of Tadalafil (17.5h).

The full prescribing information of Avanafil can be found here.

The license holder is Vivus, Inc.

ChEMBL Webinar on 30th May in Japanese only



For Japanese ChEMBLers,

ケンブルのオンラインセミナー(ウェビナー)を5月30日、日本時間午後5時(UK午前9時)より行います。ケンブルの概要及びインタフェースでの検索方法などについて紹介します。当日は、日本人スタッフが日本語で行います。もちろん質問もOK。どなたでも参加可能です。

利用方法もとても簡単です(ブラウザ+音声は電話回線)。参加登録は、Doodle(名前英語とメールアドレスを記入)か、または、担当者の池田までお問い合わせください。

他のウェビナーのスケジュールはこちらです(UK時間に注意)。ご要望があれば、今後も日本時間に合わせたウェビナーを検討致します。お気軽にご連絡ください。

また、5月11日に日本でケンブルデータベースの発表を行います。ご興味ある方はこちらをどうぞ。

Monday, 30 April 2012

Last chance to sign up for the webinar on Web Services - 2nd May



This is a last chance call for people who want to sign up for the "Web Services" webinar that will be hosted this Wednesday 2nd May at 3.30pm (GMT+1).

It will be a 45 minute webinar that will take you through the ChEMBL web services.

Remember to register your interest in our webinars on the Doodle Poll. Make sure that you leave your email address as well as your name so that we can send the connection details to you. Any problems, please contact chembl-help@ebi.ac.uk.

The poll will be closed tomorrow, 1st May to allow us to send out the connection details to the attendees.