Tuesday, 29 May 2012

Costs of Assays



I'm giving some talks over the summer, and am getting bored with some of the stuff I have, so I'm thinking of some new stuff to put in to add a bit of variety and interest. I'm getting interested in thinking about assay level attrition, and trying to put more of a taxonomy and inter-relationship mapping between assays used in drug discovery. As part of this, there is a cost component for each type of assay, going from very cheap to really really expensive. Here's a little picture from the presentation I've put together - I used educated guesses for the costs, so please, please critique them !


So, what do people think of the guesstimates of costs per compound per assay point on the picture below. I know it is really variable, there are startup costs to set something up, etc, etc. But what do you think about the orders of magnitude, are they about right? One of the key features of the numbers I've put there, are that there are  big transitions at the switch between in silico and in vitro, and then on entering clinical trials.

The picture at the top of this post (about unicorns) is from the very funny http://www.depressedcopywriter.com/.

Thursday, 24 May 2012

Love Drug Information? 17th July 2012


An early heads up to a seminar from Andy Bell (now at Imperial College) to be held here on campus, detailing the discovery and development of UK-92,480 (also known as sildenafil and even better known as V1agra and R3vat10). Andy was one of the medicinal chemists and inventors on the PDE-5 inhibitor programme at Pfizer, and the story covers many aspects of drug discovery including, of course, the discovery of the side effect, and also one where the pharmacology led to many new molecular insights into NO signalling and PDE biology.

There are many myths about the discovery of V1agra, so this is a rare opportunity to hear the exciting story first-hand.

The seminar  is on Tuesday July 17th 2012 at 2pm in room C209 - you will need to mail me in order to get registered with campus security if you don't work on campus. If you do, you can just turn up.

The presentation will not be webcast, or recorded, or posted on Youtube (sorry).

Monday, 21 May 2012

Radio 4 broadcast on Drug Discovery - Tuesday 22nd of May 8pm


Paul Workman of the Institute of Cancer Research is due to feature in a BBC Radio 4 special exploring the future of drug discovery in the UK, which will be broadcast tomorrow Tuesday 22nd May. The 40-minute programme, called The End of Drug Discovery and presented by veteran science broadcaster Geoff Watts, starts at 8pm.

The programme will “examine whether sources of better pharmaceutical treatments are drying up, in light of reports that suggest making new and improved drugs available to patients is becoming more difficult and increasingly expensive,” the BBC said. Professor Workman was recorded discussing the issues facing the pharmaceutical industry, and the role that non-profit and academic organisations like the ICR can play in driving drug discovery progress.

The programme will be repeated on Sunday 27th May at 5pm and you can also listen to it live here: http://www.bbc.co.uk/programmes/b01hxh76

GPCR Structures: delta opioid and nociceptin receptors


There will come a point in time when a new GPCR structure doesn't 'make the grade' for Nature or Science, but hopefully that time is still a considerable way off. There are two new GPCR structures released recently - the nociceptin receptor 4ea3 and the delta-opioid receptor 4ej4, bringing it to a grand total of 15 sequence distinct rhodopsin-like GPCR structures in the public domain.

Both new ones published alongside two other previously released PDB entries in last weeks Nature.

Here is an alignment - thanks to the commenter on a previous post in this series that spotted my schoolboy error in a previous version.


  1. 3uon - human muscarinic M2 receptor 
  2. 4daj - rat muscarinic M3 receptor 
  3. 3rze - human histamine H1 receptor
  4. 2rh1 - human beta-2 adrenergic receptor 
  5. 2vt4 - turkey beta-1 adrenergic receptor 
  6. 3pbl - human dopamine D3 receptor
  7. 2ydv - human adenosine A2a receptor 
  8. 3v2w - human sphingosine-1-phosphate receptor 
  9. 4djh - human kappa opioid receptor 
  10. 4dkl - mouse mu opioid receptor 
  11. 4ej4 - mouse delta opioid receptor
  12. 4ea3 - human nociceptin receptor
  13. 3odu - human CCR4 receptor 
  14. 2i35 - bovine rhodopsin 
  15. 2z73 - squid rhodopsin



                           10        20        30        40        50  
3uon   (  20 )                                             tfevvfivl
4dajA  (  64 )                                             iwqvvfiaf
3rze   (  28 )                                                 mplvv
2rh1   (  29 )                                            devwvvgmgi
2vt4A  (  40 )                                               weagmsl
3pblA  (  32 )                                                   yal
2ydv   (   3 )                                             imgssvYit
3v2w   (  17 )           sdyvnydIIvrHYnyTgklnisa                ltsv
4djhA  (  55 )                                            spaipviita
4dkl   (  65 )                                             mvtaitima
4ej4   (  41 )                                        rsasslalaiaita
4ea3A  (  47 )                                            plglkvtIvg
3oduA  (  27 )            pçfre-------------------------enanfnkiflpt
1u19A  (   1 )            mnGtegpnfyVPfsnktgvVrsPFeapQyyLaepwqFsmlAa
2z73A  (   9 )         etwwyNpsIvVhpHWref--------------dqvpdavYyslGi
                                                              aaaaaa

                           60        70        80        90        100 
3uon   (  29 )    vagslSlvTiigNilVmvSIkvnrhLqtvnnyflfSLAcADliiGvfSMn
4dajA  (  73 )    ltgflAlvTiigNilVivAFkvnkqLktvnnyFllSLAcADliIGviSMn
3rze   (  33 )    vlsticlvTvglNllVlyAvrserkLhtvGnlYIvsLSvADliVGavVMp
2rh1   (  39 )    vmslivlaIvfgNvlVitAIakferLqtvtnyFItsLAcADlvMGlaVVp
2vt4A  (  47 )    lmalVvllIvagNvlViaAigstqrLqtltnlFItsLAcADlvvGllVVp
3pblA  (  35 )    sYcalilaIvfgNglVcmAVlkeraLqtttnyLVvsLAvADllvAtlVMp
2ydv   (  12 )    vElaiavlAilgNvlVcwAvwlnsnLqnvtnyFVvsAAaADilVGvlAIp
3v2w   (  51 )    vfiliCcfIileNifvlltiwktkkFhrpMYyFIgnLAlSDllaGvaYta
4djhA  (  65 )    vysvvfvvGlvgNslVmfVIirytkmktaTniYIfNLAlADalVTtTMpf
4dkl   (  74 )    lYsiVcvvGlfgNflvmyvIvrytkMktAtniYIfNLAlADalATsTLpf
4ej4   (  55 )    lYsavcavGllgNvlvmfgIvrytkLktATniYIfNLAlADalATstLpf
4ea3A  (  57 )    lYlavcvgGllgNclvmyVIlrhtkmktatNiYIfNLAlADtlVLlTLpf
3oduA  (  44 )    iYsiIfltGivgNglvilvMgyqkklrsmtdkYRlhLSvADllFVitLpf
1u19A  (  43 )    yMflLimlGfpiNflTlyVTvqHkkLrtplNyILlnLAvADlfMVfg-GF
2z73A  (  40 )    fIgiCgiiGcggNgiViyLFtktksLqtpanmFiinLAfSDftFSlvNGf
                  aaaaaaaaaaaaaaaaaaaaaa      aaaaaaaaaaaaaaaaaa aaa

                           110       120       130       140       150 
3uon   (  79 )    lytlytvi-gyWplgpvvÇdlWlalDYvVSNAsVmNLliiSfdryfcvtk
4dajA  ( 123 )    lFttyiim-nrWalgnlaÇdlwLSiDYvASNAsVmNLlvISfDryfsitr
3rze   (  83 )    mnilyllm-skwsLgrplÇlfWLSmDYVASTASIfSVfiLCiDryrsvqq
2rh1   (  89 )    fgaahilm-kmWtfgnfwçefWTSiDVlCVTASIeTLcvIAvdryfAIts
2vt4A  (  97 )    fgatlvvr-gtWlwgsflçelWTSlDVlCVTAsIeTLcvIAiDrylaits
3pblA  (  85 )    wvvylevtggvWnfsricÇdvFVTlDVmMcTAsIwNLCaISidRytAVvm
2ydv   (  62 )    faiaIst---GfçaaçhgÇLfiACfVLVLTASSIfSLlaIAiDryiairi
3v2w   ( 101 )    Nlllsga--tTykLtPaqWFlREGsMFvALSASVfSLlaIAieryitmlk
4djhA  ( 115 )    qstvylmn--sWpfgdvlÇkiVlsiDyyNMfTSIfTLtmMSvdRyiaVch
4dkl   ( 124 )    qsvnylmg--tWpfgnilÇkiviSidYyNMFTSIfTLctMSvdRyiAVCh
4ej4   ( 105 )    qsakylme--tWpfgellÇkaVlSidYyNMFTSIfTLtmMSvDRyiavch
4ea3A  ( 107 )    QGtdillg--fWpfgnalÇktVIaiDyyNMFTSTfTLtaMSvdryvaich
3oduA  (  94 )    WavDAva---nWyfgnflÇkaVHviYTVNlYSSVwILAfISlDRylAiVh
1u19A  (  92 )    tTTlyTSlhGyFvfgptGÇnlEGffATLGGEIaLWSLvvLaieRyvvVck
2z73A  (  90 )    plMtiSCflkkWifgfaaÇkvYGfiGGiFGFMsIMTMAMiSiDrynViGr
                  aaaaaaa        aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa  

                           160       170       180       190       200 
3uon   ( 128 )    pltypvk---rttkmAgmmiaaAwvlSfilwapaIlfwqfivg-------
4dajA  ( 172 )    pltyrak---rttkrAgvmiglAwviSfvlWApaIlfwqyfvg-------
3rze   ( 132 )    plrylky---rtktrAsatilgawflSfl-WvipIlgwnh          
2rh1   ( 138 )    pfkyqSl---ltknkArviilmvwivSgltSflpIqmhwyr-----athq
2vt4A  ( 146 )    pfryqsl---mtrarAkviictvwaiSalvSflpImmhwWr-----dedp
3pblA  ( 135 )    pvhyqhgtgqsscrrValmitavwvlAfaVSc-pLlfgfNtTg-------
2ydv   ( 109 )    plryngl---vtgtrAkgiiaicwvlSfaIGltPmlgwnnÇgqp--kegk
3v2w   ( 156 )               nnfrlfllisacwviSlilGglPimgwn-----------
4djhA  ( 163 )    pvkaldf---rtplkAkiinicIwllSssvGisAivlGGtkvred-----
4dkl   ( 172 )    pvkaldf---rtprnAkivnvcNwilSsaiGlpVmfmAttkyrqg-----
4ej4   ( 153 )    pvkaldf---rtpakAklinicIwvlAsgvGvpimvmAvtqprdg-----
4ea3A  ( 155 )    p          tsskAqavnvaIwalAsvvGvpvaimGsAqvede-----
3oduA  ( 141 )    atn---sqrprkllAekvVyvgVwipAlllT-ipDfif--Anvsead---
1u19A  ( 142 )    pmsn----frfgenhaimgvafTwvmAlaCAapPlvgwSrYIPE------
2z73A  ( 140 )    pmaas---kkMshrrAfimiifVwlwSvlwAigPifgwGaYtLE------
                              aaaaaaaaaaaaaaaaaaaa aaa              

                           210       220       230       240       250 
3uon   ( 168 )    ----vrtVedgeÇyIqff------snaavtfgtAiaaFylpviiMtvlyw
4dajA  ( 212 )    ----krtVppgeÇfIqfl------septitfgtAiaaFymPvtiMtilyw
3rze   ( 175 )           rredkÇeTdfy------dvtwfkvmtaiinFylPtllMlwfya
2rh1   ( 180 )    eAinÇyae-etçÇdff--------TnqayaiasSivSFyvplviMvfvYs
2vt4A  ( 188 )    qAlkçyqd-pgçÇdfv--------TnrayaiasSiiSFyipLliMifval
3pblA  ( 177 )    --------dptvÇsIs---------npdFViySSvvSFylPfgvTvlvya
2ydv   ( 154 )    ahsqgÇgegqvAÇlFedVV-----pmnYMVyfNffaCVlvPlllMlgvyl
3v2w   ( 184 )    ----ÇisalssÇSTVLP-------LYhkhYIlfCTtvFtllllsIvilYc
4djhA  ( 205 )    -------vdvieÇslqFpdddyswwdlfmkicVfifAfviPvliIivcyt
4dkl   ( 214 )    ---------sidçtltfsh-ptwywenllKicVfifAfimPvliItvcyg
4ej4   ( 195 )    ---------avvÇmlqfps-pswywdtvtkicvflfAfvvPiliitvcyg
4ea3A  ( 197 )    ---------eieÇlveipt-pqdywgpvfaiciflfSFivPvlvIsvcys
3oduA  ( 182 )    --------dryiÇdrfyp---ndlwvvvfqfqhimvglilPgivIlsCyc
1u19A  ( 182 )    -------GMQCSÇGIDYYTpheetnNesFViyMfvvHfiiPlivIffcyg
2z73A  ( 181 )    -------GVLCNÇSFdYIsr--dsttrsNIlcMFilGffgPiliiffCyf
                                            aaaaaaaaaa  aaaaaaaaaaaa

                           260       270       280       290       300 
3uon   ( 208 )    hisrasksri                   pppsrekkvtrtilaIllaFi
4dajA  ( 252 )    rIyketek                       like   aqTlsaIllaFi
3rze   ( 212 )    kIykaVrqhc                   lhmnrerkaakQLgfIMaaFi
2rh1   ( 221 )    rVfqeakrql                   kfclkeHkaLktlgiIMgtFt
2vt4A  ( 229 )    rvyreakeq                       irehkalktlgiImgvFt
3pblA  ( 210 )    rIyvvlkqrrrk-----------------gvplrekkatqMVaiVlgaFi
2ydv   ( 199 )    rIflaarrqlkqmesq             stlqkevhaakSLaiIvglFa
3v2w   ( 223 )    riyslvrtr                   asrssenvaLlkTViiVLsvFi
4djhA  ( 248 )    lMilrlksvrllsg              rekdrnlrritrLVlvVVavFv
4dkl   ( 254 )    lmilrlksvr                   ekdrnlrritrMVlvVvavFi
4ej4   ( 235 )    lMllrlrsvr                   ekdrslrriTrMVlvVvgaFv
4ea3A  ( 237 )    lMirrlrgvrlls-------------gsrekdrnlrritrLVlvVvavFv
3oduA  ( 221 )    iIisklshs                     kghqkrkalktTviLilaFf
1u19A  ( 225 )    qLvftvkeaaaq------------qqesattqkaekevTrMviiMviaFl
2z73A  ( 222 )    nIvmsvsnhekemaamakrlnakelrkaqaganaemrlAkIsivIVsqFl
                  aaaaa                            aaaaaaaaaaaaaaaaa

                           310       320       330       340       350 
3uon   ( 398 )    itWapYNvmVlintfçap--------ç--ipntvwtiGywlCYinstiNp
4dajA  ( 501 )    itWtpyNimVlvntfçds--------ç--ipktywnlgywlCYiNStvNP
3rze   ( 426 )    lCWipYFiffmviafçkn--------ç--cnehlhmftiWlGYiNStlNP
2rh1   ( 284 )    lcWlpFFiVNivhviqdn----------lirkevyillNwiGYvNSgfNp
2vt4A  ( 301 )    lCWlpFFlvnivnvfnrd----------lvpdwlfvafnwlGYAnSAmnp
3pblA  ( 340 )    vCWlpFFltHvlnthçqt--------ç-hvspelysattwlGYvNsalNP
2ydv   ( 244 )    lCWlpLHiiNcftffçpd--------çshaplwlMylAivlSHtNSvvNP
3v2w   ( 267 )    acwapLFiLLllDvgçkvk------tç--diLfrAeyfLvlAvlNSgtNP
4djhA  ( 285 )    vcWtpIHifilvealgs            aalssyyfcIalGytNSslNP
4dkl   ( 291 )    vcWtpIHiyViikaliti-------pettfqtvswhfcialGYtNSclNp
4ej4   ( 272 )    vCWapIHifVivwtlvdi------nrrdplvvaalhlcialGYaNSslNp
4ea3A  ( 274 )    gcWtpVQvfvlaqglgvq-------pssetavailrfctAlGYvNSclNp
3oduA  ( 250 )    acWlpyyigisidsfilleiikqgçefentvhkwisitEAlAFfHCclNp
1u19A  ( 263 )    iCWlpYAgvAfyIfthqgsd---------fgpifMTipAFfAKtSAvyNP
2z73A  ( 272 )    lSWspYAvvAllAQfgplew---------VtpyaAQlpVMfAKaSaihNP
                  aaaaaaaaaaaaaaa                aaaaaaaaaaaaa   aaa

                           360       370       380       390       400 
3uon   ( 438 )    acYalcnatFkktfkhllm                               
4dajA  ( 541 )    vcYalcnktFrttfkt                                  
3rze   ( 466 )    liYplCnenFkktfkrilhi                              
2rh1   ( 324 )    liYc-rspdfriAfqellcl                              
2vt4A  ( 341 )    iiYc-rspdfrkAfkrlla                               
3pblA  ( 381 )    viYttfnieFrkAflkilsc                              
2ydv   ( 286 )    fiyAyrireFrqTFrkiirshvlrqqepfkaa                  
3v2w   ( 309 )    iiytltNkemrrafiri                                 
4djhA  ( 328 )    ilYafldenFkrcfrdfcfp                              
4dkl   ( 334 )    vlYafldenFkrCfrefci                               
4ej4   ( 316 )    vlYaflDenfkrc                                     
4ea3A  ( 317 )    ilYafldenFkacfr                                   
3oduA  ( 300 )    ilyaflgakfktsaqhalts                              
1u19A  ( 304 )    viYimmnkqFrnCmvttlccgknplgddeasttVsktetsqvapa     
2z73A  ( 313 )    miYsvsHpkFreAIsqtfpwvLtccqfddketeddkdaeteipage    
                  aaaaa  aaaaaaaaaa                                 

                           410

%T Crystal structure of the µ-opioid receptor bound to a morphinan antagonist
%A A. Manglik
%A A.C. Kruse
%A T.S. Kobilka
%A F.S. Thian
%A J.M. Mathiesen
%A R.K. Sunahara
%A L. Pardo
%A W.I. Weis
%A B.K. Kobilka
%A S. Granier
%J Nature 
%V 485
%P 321–326
%O doi:10.1038/nature10954
%D 2012

%T Structure of the human κ-opioid receptor in complex with JDTic
%A H. Wu
%A D. Wacker
%A M. Mileni
%A V. Katritch
%A G.W. Han
%A E. Vardy
%A W. Liu
%A A.A. Thompson
%A X.-P. Huang
%A F.I. Carroll
%A S.W. Mascarella
%A R.B. Westkaemper
%A P.D. Mosier
%A B.L. Roth
%A V. Cherezov
%A R.C. Stevens
%J Nature 
%V 485
%P 327–332
%O doi:10.1038/nature10939
%D 2012

%T Structure of the nociceptin/orphanin FQ receptor in complex with a peptide mimetic
%A A.A. Thompson
%A W. Liu,  
%A E. Chun
%A V. Katritch
%A H. Wu
%A E. Vardy
%A X.-P. Huang
%A C. Trapella
%A R. Guerrini
%A G. Calo
%A B.L. Roth
%A V. Cherezov
%A R.C. Stevens
%J Nature 
%V 485
%P 395–399
%O doi:10.1038/nature11085
%D 2012

%T Structure of the δ-opioid receptor bound to naltrindole
%A S. Granier
%A A. Manglik
%A A.C. Kruse
%A T.S. Kobilka
%A F.S. Thian
%A W.I. Weis 
%A B.K. Kobilka
%J Nature
%V 485
%P 400–404
%O doi:10.1038/nature11111
%D 2012

Why Drug Repurposing for Small Molecule Drugs is Probably Best Done in Academia



There was a lovely paper in Nature Medicine recently by James McKerrow and colleagues from UCSF on the discovery that auranofin is a good anti-amoeba agent. As the name suggests, auranofin contains gold, and would not make many people's lists of drug like compounds, but hey it's a drug, a real drug!

Anyway, it got me thinking a bit about drug reuse, and the current drug development process is probably configured to make drug repurposing/indication expansion/rescue/whatever a sweet spot for academics, with relatively few incentives for pharma and biotech to actively pursue. I'm sure others have thought of this before, so sorry for being repetitive. I think biologicals will be a different kettle of fish.
  • Current patent life is too short to allow cautious post-approval studies in new indications in most cases, especially for diseases that are chronic and have a long read-out. What are the incentives for a company to perform these trials, when they are in reality probably getting a small bump on revenue at the tail end of the patent life, and building a market for generics. As an aside, I am coming firmly to the view that the patent system for drugs is just wrong. To base the system on reward for 21st Century R&D in healthcare on the length of medieval apprenticeships is just plain mad (see here for a little pointer to background).
  • Regulators are (quite rightly) cautious, and extra hurdles of cost-effectiveness and price negotiations again don't work towards the developing company for a drug actively pursuing new indications - there was a recent case for Aricept, where a recent study recommended use at an earlier stage of Alzheimer's, following many years of regulators turning down data from the developer's encouraging earlier use on the basis of their data. If it is the case that Aricept is helpful to patients with milder disease, for this to happen when the drug becomes generic is a bitter pill (no pun intended). I will check the facts on this and then update if it is way off.
  • Taken further - isn't one strategy for health providers with an eye on costs to resist new indications precisely until the drug is generic. Of course, this allows the building of a good safety record for the drug, across a far more diverse and ill patient population.
  • Academics (pre-clin and clinical) (and of course non-profits) have different drivers with respect to reward, and there is a pretty good alignment between the 'business model' of academic and drug repositioning studies maybe. However, other factors may kick in here, lack of funding, risk aversion, lack of experience and naivety of the process, and there are also sharp contrasts between the personality free operations of pharma (usually) and the personality rich environment of academia (usually).
  • Release of data from the initial compound developers is obviously a good thing - many pairs of eyes looking at the data, preventing wasted effort, etc. but what are the drivers for release of such data? It is expensive to do, and would be seen by some (investors and staff) as throwing money away.

Monday, 14 May 2012

ChEMBL Webinar 16th May 'Schema & SQL Querying' - Posted by Louisa




This is a last call for people wanting to sign up for the "Schema & SQL Querying" webinar that will be hosted this Wednesday 16th May at 3.30pm (BST).

It will be a 45 minute webinar that will take you through the ChEMBL schema and also how to use SQL queries to extract data from the database.

Remember to register your interest in our webinars on the Doodle Poll. Make sure that you leave your **email address** as well as your name so that we can send the connection details to you. Any problems, please contact chembl-help@ebi.ac.uk.

For those of you who can't make it to this webinar, we will be hosting it again on the 27th June.

Thursday, 10 May 2012

USAN Watch - May 2012

The USANs for May 2012 have just been published. 


Update: It looks like there is now a publication of the list, then a few more added. I've captured these, as I note them, but it's not ideal.....


USAN Research Code StructureDrug ClassTherapeutic classTarget
asparaginase Erwinia chrysanthemicrisantapase (INN)enzymetherapeuticn/a
cindunistat hydrochloride maleatePHA-728669Fsynthetic small moleculetherapeuticNOS?
enzalutamideMDV-3100synthetic small moleculetherapeuticAR
flutemetamol F18[18F]AH-110690   synthetic small moleculeimaging agent
sodium glycerophosphate

natural product derived small moleculesupplementn/a
lasmiditan, lasmiditan succinateLY-683974, COL-144synthetic small moleculetherapeutic5HT1F
lifitigrast, lifitigrast sodiumSAR-1118-023

synthetic small moleculetherapeuticLFA-1 integrin
neceprevir sodiumACH-0142684.Na, ACH-2684.Na
synthetic small molecule
therapeuticHCV NS3
nintedanib, nintedanib esylateBIBF-1120synthetic small moleculetherapeuticFGFR, PDGFR, VEGFR
serelaxinRLX-030synthetic small moleculeitherapeuticRXFP1, RXFP2
trenonacog alfaFactor IXenzymetherapeuticn/a
vercirnon sodiumGSK-1605786A   synthetic small moleculetherapeuticCCR9

Monday, 7 May 2012

New Drug Approvals 2012 - Pt. XI - Taliglucerase alfa (ElelysoTM)






ATC code: A16AB11
Wikipedia: Taliglucerase alfa
 

On May 1, the FDA approved taliglucerase alfa for the treatment of Type I Gaucher's disease. Gaucher's disease is the most common of the lysosomal storage diseases. It is a hereditary disease caused by a deficiency of the enzyme β-glucocerebrosidase (Uniprot: P04062), also called β-Glucosidase. Gaucher's disease is a rare genetic disease with an incidence of 1 in 50,000 births and is considered an orphan disease. Type I Gaucher's disease is about 100 times more common in people of Ashkenazi jewish descent compared a north American population. Symptoms of type I Gaucher's disease begin typically in early adulthood and include enlarged liver and grossly enlarged spleen, impaired bone structure, anemia and low platelet levels, leading to prolonged bleeding and easy bruising. If enzyme replacement therapy (ERT) is available, the prognosis for patients with type I Gaucher's disease is good.

β-Glucocerebrosidase is an enzyme of 536 amino acids and molecular weight of approximately 59.7 kDa. The gene for β-glucocerebrosidase is located on the first chromosome (1q21) and catalyzes the hydrolyzation of glucocerebrosides (eg. ChEBI:18368), a process required for the turnover of the cellular membranes of red and white blood cells.  Macrophages clearing these cells fail to metabolize the lipids, accumulating them instead in their lysosomes.  Thus, macrophages turn into dysfunctional Gaucher cells and abnormally secrete inflammatory signals. The deficiency of glucocerebrosidase in Type I Gaucher's disease is only partial and in most cases caused by a mutation  replacing asparagine with serine in the 370th residue of the protein sequence. The deficiency of the mutant enzyme can be compensated by injection of an exogenous replacement and drastically improve the prognosis for patients with type I Gaucher disease. Prior to the approval of taliglucerase alfa, imiglucerase and velaglucerase alfa were already available ERTs for type I Gaucher's disease. The graphic below illustrates the reaction catalyzed by β-glucocerebrosidase and ERTs. The enzyme classification code for β-glucocerebrosidase is 3.2.1.45.



 Taliglucerase alfa is a monomeric glycoprotein containing 4 N- linked glycosylation sites and has a molecular weight of 60,8 kDa. The recombinant enzyme differs from native human glucocerebrosidase by two amino acids at the N terminal and up to 7 amino acids at the C terminal. Taliglucerase alfa is decorated with mannose-terminated oligosaccharide chains that are specifically recognized by macrophage receptors and assist in 'homing' the enzyme to its target cells.

Taliglucerase alfa is the first ERT expressed in plant cells (carrot root cells), not mammalian cells. Cultures of plant cells are more cost-effective for the expression of recombinant enzymes. 


Crystal structure of the human glucocerebrosidase (PDBe 1ogs).


The recommended dose is 60 Units/kg of body weight administered once every 2 weeks as a 60-120 minute intravenous infusion. A Unit is the amount of enzyme that catalyzes the hydrolysis of 1 micromole of the synthetic substrate para-nitrophenyl-β-D-glucopyranoside (pNP-Glc) per minute at 37°C. Adverse effects include pharyngitis, headache, arthralgia, flu and back pain.

Taliglucerase alfa is marketed by Pfizer and Protalix under the brand name Elelyso


The full prescribing information can be found here.


Sunday, 6 May 2012

Are there around 1019 Lipinski-like small molecules?


I'm a big fan of the work of Jean-Louis Reymond at the University of Berne, and am starting to imagine a time when the enormity of chemical space can be reasonably comprehensively mapped and explored, at least for 'fragment-sized' molecules. In the field of bioinformatics, the number of possible peptides is considered quite large - for example, for a peptide composed from the 20 natural peptides, there are 2010 possible distinct decapeptides (this is 10240000000000, or 1.024 x 1013 which is a big number of course, but not that big, and a decapeptide will have an average molecular weight of about 1,100 Da. For a 500ish molecular weight natural peptide there are only 3.2 million possibilities. However, small molecules comprehensively trash these 'biologically constrained' numbers, making cheminformatics I think a great frontier and challenge for HPC and "large data".

The GDB databases give some idea of the size of drug-like chemical space. If you take the current GDB databases, and plot the size of the library as a function of the number of heavy atoms...

...you get a classic log plot, essentially the largest library is so much bigger than the smaller sets that it dominates the number of compounds in the library. So on a linear scale plot it looks like this, 

but on a log scale, its approximately linear, and a regression can be readily established against this.


So, for the GDB containing 33 heavy atoms (which at an average heavy atom mass of 15 Da, corresponds to a molecular weight of around 500), gives about 1019 to 1020 distinct molecules. Of course, there are a bunch of assumptions behind the GDB enumeration approach (limited elements, but sensibly limited, the fraction of Lipinski compliant molecules within that set is an open question, but even if only 1% are, then it doesn't affect this number too much. 

1019 is too big to even think about storing - as SMILES it is a zettabyte scale storage problem alone, but smart subset sampling, and the ever growing advances in data compression, processor power and connectivity, will no doubt start to chip away at this challenge of chemical comprehensiveness. 

As an aside - a google search shows that one of the largest storage arrays in the world at the moment is a 150 petabyte system at IBM Alamaden - so 1 zettabyte is about 7,000 times the size of this.

Thursday, 3 May 2012

PPI Library - Part 3



It turns out that scientists and the rest of the world interpret 'PPIs' as very different acronyms - as the amount of spam comment filtering for Payment Protection Insurance I’ve had to delete shows. Anyway, life got in the way of science for a few weeks for me ( :( ), but some more of the PPI work is described here. 

A very simple algorithm was applied to build a library of experimental peptide conformers.  Firstly every tetra-peptide from a protein structure was extracted; one of these peptides was then taken as a seed for a conformational cluster, and subsequent tetra-peptides were fitted to this fragment. If the RMSD for the main-chain atoms was lower that a cutoff parameter, the original 'seed' fragment was taken as representative of that cluster. If the RMSD was greater than the cutoff, then a new cluster was established, and any subsequent tetra-peptides were fitted to both cluster representatives, and so forth. As more unique peptide conformers are seen (defined according to the RSMD cutoff) the number of clusters increases. Of course, the population of each column is stored - some conformers are really common (alpha-helix and beta-strand fragments) and others are rare/experimental errors. 



At a large cutoff parameter, all tetra-peptides would cluster in the same set as the initial seed, and at a sufficiently small cutoff, then every tetra-peptide would be unique.

When applied to 2ptn (bovine trypsin, for deeply routed reasons my favorite PDB entry ever, and contain most features of globular proteins, secondary and super-secondary structure, turns, etc.) the following number of representative clusters were found, shown as a function of the RMSD cutoff. One way of thinking of this approach, is that the library can be though of containing every possible peptide conformation, at a given error/variation/resolution. So, it’s a sort of variable ‘resolution’ library. For 2ptn, you can see that the library complexity takes off below about 0.7 Angstrom RMSD. There is the asymptote at around 220, since this is about the number of residues in 2ptn. 







There are a few tricks that need handling in the code, primarily in the treatment of peptides that span chain breaks in the protein structure - for this analysis, the four residues needed to be covalently contiguous (i.e. No internal chain breaks).
So, we now have a way of building a representative library of peptide conformers that we can think about suing as scaffolds for mimicking in our PPI library (as well as the mainchain donor/acceptor positions, we also have the C-alpha to C-beta vectors).

The next step is to extend this approach to a larger, more representative library of protein structures, let's use a validated (but ancient) paper for this.

%A U. Hobohm
%A C. Sander
%T Enlarged representative set of protein structures
%J Protein Science
%V 3
%P 522-524
%D 1994

Trivia: The photo above is of one of my sons, on mayoral voting day 2012, in a very wet London. You are never too young to learn about politics!


Update: Sorry the figures got barfed by the blogger software with a bad url, and got lost, so I've replaced them.