ChEMBL Resources


Friday, 29 June 2012

New Drug Approvals 2012 - Pt. XIII - Lorcaserin hydrochloride (Belviq™)

ATC Code: A08A (incomplete)
Wikipedia: Lorcaserin

On June 27th, the FDA approved Lorcaserin hydrochloride (Tradename: BelviqTM; Research Code: APD-356), a selective serotonin 2C receptor (5HT2c) agonist, for chronic weight management in adults with an initial body mass index (BMI) equal or higher than 30 kg/m2 (obese), or equal or higher than 27 kg/m2 (overweight) and with one weight-associated comorbid condition (e.g. hypertension, dyslipidemia, type 2 diabetes).

Lorcaserin is believed to decrease food consumption and promote satiety by selectively activating 5-HT2C receptors on anorexigenic pro-opiomelanocortin neurones in the hypothalamus. The exact mechanism of action is not fully established. However, at therapeutic concentrations, lorcaserin is selective for 5-HT2C receptors as compared to 5-HT2B receptors, making it less prone to cardiovascular side-effects, associated with previous 5-HT2C weight management drugs. It is the first anti-obesity drug to be approved after the withdraw of Dexfenfluramine in 1997. Given the safety issues of previous drugs, the development has focussed on safety, and with the data in hand, lorcaserin has been approved without a boxed warning.

The 5-HT2C receptor (Uniprot: P28335, ChEMBL: CHEMBL225) belongs to the G-protein coupled receptor (GPCR) type 1 family, and binds the endogenous neurotransmitter serotonin. Its activation inhibits dopamine and norepinephrine release.

>5HT2C_HUMAN 5-hydroxytryptamine receptor 2C

Lorcaserin (IUPAC: (5R)-7-chloro-5-methyl-2,3,4,5-tetrahydro-1H-3-benzazepine; Canonical smiles: C[C@H]1CNCCc2ccc(Cl)cc12; PubChem: 11658860; Chemspider: 9833595; ChEMBLID: CHEMBL360328; Standard InChI Key: XTTZERNUQAFMOF-QMMMGPOBSA-N) is a benzazepine with a single chiral center, with a molecular weight of 195.7 Da, 1 hydrogen bond acceptor, 1 hydrogen bond donor, and has an ALogP of 2.75. The compound is therefore fully rule-of-five compliant.

Lorcaserin is available as film-coated oral tablets of 10 mg, and the recommend daily dose is 20mg. The plasma half-life of lorcaserin (t1/2) is approximately 11 hr, and is moderately bound (~70%) to human plasma proteins. The absolute bioavailability of lorcaserin has not been reported.

Lorcaserin is extensively metabolised in the liver by multiple enzymatic pathways, however it inhibits CYP2D6, and therefore an increase in exposure of CYP 2D6 substrates may occur. The two major metabolites are lorcaserin sulfamate, and N-carbamoyl glucuronide lorcaserin.

As a serotonergic drug, patients should be monitored for the emergence of serotonin syndrome. For the full list of adverse reactions and drug-drug interactions please refer to the full prescribing information.

The license holder for BelviqTM is Arena Pharmaceuticals, and the full prescribing information can be found here.

Wednesday, 27 June 2012

Do all roads lead to the two Cambridges and Basel?

So some very sad news yesterday - the announcement of the closure of the Nutley R&D facilities of Roche. Families disrupted, kids moved to new schools, partners careers affected, and then inevitable greater competition for the few remaining jobs and downward pressure on salary and benefits. It's a tough time to be a drug discoverer wanting to work in industry. Quite striking to me in this is the geographical focussing of the drug discovery industry in three places, Cambridge MA, Cambridge Cambs (or 'Cambridge Classic' as I call it) and Basel (you could also add a fourth, South San Francisco to this list, maybe). You can understand why this happens, the companies want access to a 'liquid' talent pool, and employees want 'robustness' to future layoffs, and this leads to focussing of 'resources' in a few geographical locations.

Would be interesting to pull together the data for the last 40 years on R&D site locations and staff sizes, and then plot them on one an interactive maps - showing first growth, and then retrenchment. Load me up with some armodafinil, and I'll have a go.

Tuesday, 26 June 2012

Webinar Reminder - Schema & SQL Querying (Posted by Louisa)

This is a last call for people who would like to sign up for the "Schema & SQL Querying" webinar that will be hosted this Wednesday 27th June at 3.30pm (BST).

It will be a 45 minute webinar that will take you through the ChEMBL schema and also how to use SQL queries to extract data from the database.

Remember to register your interest in our webinars on the Doodle Poll. Make sure that you leave your **email address** as well as your name so that we can send the connection details to you. Any problems, please contact

Thursday, 21 June 2012

"Daddy, I didn't know you're a bad man" - Open Access

I got a call from a tearful girl recently, saying

"Daddy, I didn't know you did bad things at work"

once over the initial shock, I thought what could she be talking about - did she know about the short-cut I use to get to the river to watch the trout?, or that I sometimes asked people in front of me in the queue for the coffee machine to get me a coffee to save my time at the inconsiderate expense of those in front of me? No, it was because I was involved in "Open Access" - lucky then she didn't know then that I was actually really working on a more extreme version - the more fundamentalist version that is the destroyer of capitalism - Open Data! ;)

I said,

"Princess, it's not that bad; honest; Daddy isn't a bad man",

she laughed at my distressed response, and then said

"you should read the Daily Mail then".

This is what she had read.

The Daily Mail is the most widely read UK newspaper, it also is a well known source of shockers about health threats and benefits from just about household object or food, but the fact is it's read by about 4,400,000 million Britons every day.

Here is the introductory paragraph of the article (used under 'fair use').

Up to £1billion of income and thousands of jobs could be placed at risk as a result of a move by Downing Street to allow Google and other digital search engines ‘open access’ to the nation’s best academic and scientific research.

So a lot of jobs are going to go in the UK and we're going to lose 'income' (what is that tax income? sales revenue?, etc). I don't also quite understand the distinction between academic and scientific - I thought the mandatory open access proposals were connected solely with publicly funded research, and don't, and shouldn't, extend to privately sponsored research - if that is behind the distinction. Downing Street and Google are also semi-structured concepts plucked from the ether to build a web of plausibility for what follows - ultimately feeding xenophobia and protectionism.

Here is some more from the article,

But UK businesses fear that the proposals will destroy Britain’s highly-regarded academic publishing industry that modifies raw research, publishes it in the form of academic magazines, journals and books and exports it to the rest of the world.

Use of the word 'exports' is also contrived, most would agree, but which UK businesses will suffer? . Well, it turns out publishing businesses that make significant profits from the current system, they are becoming concerned may be threatened by this proposed change to UK research publication.

You're all intelligent and well informed, so I don't need to comment any more, but just read it, go on, read it!

One leading publishing group said the move to provide all of Britain’s academic output online for nothing could destroy a £1 billion industry that employs 10,000 people here and in its overseas operations.

Much of the scientific work from the nation’s leading research universities is passed on to the academic publishing industry where it is subjected to so-called ‘peer review’, or examination by experts, before it is published in journals and books that are also available online.

The material is a valuable source of income to UK publishing houses such as Reed Elsevier, one of Britain’s leading publishers with a market value of £6billion, as well as the hugely-respected Oxford University and Cambridge University Presses.

and also

In reality academic publishers and researchers fear that scientific and other academic studies, paid for by the taxpayer, will be made freely available to researchers in China and elsewhere in the Far East.

Most rational people would say good to this, that's the way global science is meant to work. Also who were the researchers that would have agreed with this nonsense? I always thought of Reed-Elsevier as a globally operating Anglo-Dutch company, as opposed to deeply rooted and primarily operating within the United Kingdom, which is the impression most readers would be left with.

This is the funniest bit though...

Publishers are concerned that if an open access policy is adopted then some of the biggest scientific companies, such as GlaxoSmithKline, might move research work from British labs to those overseas where it will able to protect itself from open access.

Not attributed to GSK themselves of course but a sort of 'Munchausen's by proxy' worry on behalf of others. The concept of moving overseas to protect your business from open access is just so funny -especially given GSKs activities in pre-competitive activities, collaboration, and Open Data release. 

There's also stuff linking the proposal to make more research available to people to internet piracy, etc.

Seriously though, the fact that this sort of nonsense is read by a very large number of people, and aimed directly at influencing public opinion is, to me, scary. The implicit threats in the article and linkage to economic stability and national competitiveness is, arguably irresponsible.

What have I learnt from this? - well the debate and argument isn't to be held amongst ourselves, it's with the public, the broad base of funders (via their tax contributions, bequests and charitable donations) of our work. We should make them proud of the way we spend their money, we should be accountable for this, and our output should be available for all of them - not just in our own country, but throughout the world. 

Imagine the pride for a researcher mother or father when they get a random call from their kids, saying 

"Hey, I'm so proud that you work in Open Access, and all my friends think you're way cool"

Monday, 18 June 2012

Invitation to join the Teach-Discover-Treat Initiative

The Teach-Discover-Treat (TDT) initiative was launched at the ACS meeting in San Diego under the umbrella of the Computers in Chemistry Division. The slides that describe the initiative are available on the website

TDT aims to address outstanding gaps in drug discovery education and treatments for neglected diseases. A competition was launched that solicits submissions of computational models and tutorials for drug discovery for neglected diseases. All tools used for the computational workflows must be freely available (open source, free web servers, free download of executables) to enable global collaboration and innovation.

There are 4 categories in the competition, and 4 awards!  Three of the categories are focused on specific neglected diseases for which datasets have been provided. Specific requirements have been formulated for the workflows that are to be the submissions for the competition in these three categories. The requirements are outlined in the readme files that are part of the data download packages. Additional background information is available in the slides from the kick-off symposium, which can be found on the website. Real-life impact on these three challenges will be realized through experimental follow-up on the winning submissions (compound acquisition, synthesis, and biochemical testing through various academic partnerships). A fourth "open innovation" category seeks innovative drug discovery workflows that are either exemplified on a neglected disease project or adaptable to a neglected disease application.

The datasets for the competition are available here:

Visit the website for comprehensive information and note that the competition closes on September 5, 2012.

Saturday, 16 June 2012

Carbon and Oxygen - Simples

Bissan and I just had a really good breakfast discussion on drugs, and it prompted me to do a picture for the blog. There'll be some more, depending on collecting glasses from the opticians, getting some ivermectin for the tick infections that have appeared on Vini and Bruce the bearded dragons (named after two of the major protagonists of the fantastic and situationist durutti column), the length of the queue of cars at the municipal tip in Sutton, and other weekend domestic stuff.

Carbon and Oxygen are two essential life elements, and really important in drug structures - in fact, most drugs are 'organic' chemicals - those based on primarily on carbon chemistry. Carbon and oxygen can be combined in chemical reactions in various ways to give a number of compounds - they are all oxides of carbon. These are...

  • Carbon dioxide - CO2 - that well known environmental villain (and also life sustaining chemical)
  • Carbon monoxide - CO - that well known poison (and also endogenous signalling molecule)
  • Carbon suboxide - C3O2 - that not so well known oxide of carbon (this one has few virtues) This is one of my interview questions to probe people's chemistry knowledge (keep this important fact a secret though).

As an interesting aside these are all linear molecules.

Imagine the atom ratios of these compounds plotted on a line, between a carbon ratio 1.0 (the element carbon, which occurs in a variety of expensive/worthless allotropes) and 0.0 (which would be the element oxygen - which physically, under normal conditions occurs as that life giving molecule dioxygen (which can also be poisonous), and ozone (again that life giving/taking molecule)). So for the carbon oxides, CO is at coordinate 0.5, CO2 at 0.333, etc. This is what it looks like.....

Update - here's the same but for combinations of oxygen and nitrogen.

Of course, things become more interesting when you mix in more elements...... What I am discovering though is that triangular graphs are more complicated than I remember from when I was 12 years old.

Congratulations, Janet Thornton DBE !!

So, congratulations are in the air again - this time to Janet Thornton (although the wikipedia page is now out of date), Director of the EMBL-EBI. Janet has just been awarded the British honour of Dame Commander of the Most Excellent Order of the British Empire (DBE) in the 2012 Queen's Birthday Honours list. This is absolutely fantastic news, and well done from all in the ChEMBL group. Many of you will already know Janet's fantastic track record of innovation in the study of protein structure, enzyme function, and more recently in some of the molecular processes of ageing - but Janet has also done great things in championing the sharing of biological data, and in encouraging sharing and collaboration on a global and international scale, for example via the establishment of the ELIXIR infrastructure.

It's great to see her achievements and service recognised more broadly in society through this titular honour.

Update: someone has updated the wikipedia page. Citizen curation in action!

Wednesday, 13 June 2012

Congratulations to the 2012 Chemistry World Entrepreneur of the Year

Congratulations to one of our collaborators, Paul Workman FMedSci of the Institute of Cancer Research on being awarded the RSC Chemistry World Entrepreneuur of the Year award for 2012. This award recognizes individuals that have made significant contributions to the commercialisation of science. Paul has of course got a fantastic record of significant scientific discovery in the area of molecularly targeted cancer research, but has consistently searched for ways of generating value and patient benefit from these by founding a number of companies to move these discoveries forward to potential therapies and products - spending time with Paul really makes clear the drive, belief and energy that's required to make a great scientific entrepreneur and innovator.

Examples of his commercialisation of science include the founding of Piramed (acquired in a successful exit by Roche) and Chroma Therapeutics.

So well done Paul! And for readers of the blog interested in hearing some of Paul's translational research, there's a great opportunity to do this at the EMBO Chemical Biology 2012 meeting.

Tuesday, 12 June 2012

New Drug Approvals 2012 - Pt. XII-pertuzumab (Perjeta™)

ATC Code: L01XC13
Wikipedia: Pertuzumab

On June 8th 2012, the US FDA approved pertuzumab (also known as RG-1273 and RhuMAb-2C4, tradename: Perjeta) for the treatment of HER2/ERBB2 positive, late stage metastatic breast cancer who have not received prior anti-HER2 therapy or chemotherapy for metastatic disease. Breast cancer is the most common female cancer. About 20% of breast cancers have amplified and over expressed Epidermal Growth Factor Receptor 2 (EGFR2, a.k.a. ERBB2 and HER2). These cancer subtypes are associated with worse prognosis and higher metastatic rates.

Pertuzumab is an anti-ERBB2/HER2 recombinant humanized monoclonal. It has been approved for use as part of a triple combination containing pertuzumab, another anti-ERBB2/HER2 antibody, trastuzumab, and the taxane docetaxel. The added value of combining both anti-ERBB2/HER2 antibodies is that pertuzumab binds to a different part of ERBB2 - the extracellular dimerization domain (Subdomain II) and this way it sterically blocks ligand-dependent heterodimerization with other HER family members. Meanwhile, trastuzumab binds to and inhibits the juxtamembrane portion of the extracellular domain.
Pertuzumab inhibits ligand-initiated intracellular signaling through the MAP kinase pathway, leading to cell growth arrest and the PI3 Kinase pathway, leading to apoptosis.

Superposition of the structures of pertuzumab (red) bound to ERBB2 (pink)- PDBe:1s78, with trastuzumab (blue) with ERBB2 (green) - PDBe:1n8z.

Pertuzumab has been issued a Black Box Warning because it can cause embryo-fetal death and birth defects, and thus cannot be used by women who are pregnant.

The target of pertuzumab is Human Epidermal Growth Factor Receptor 2 (ERBB2, HER2) (Uniprot:P04626; ; chembl:CHEMBL1824; canSAR:ERBB2-P04626).

>sp|P04626|ERBB2_HUMAN Receptor tyrosine-protein kinase erbB-2 OS=Homo sapiens GN=ERBB2 PE=1 SV=1

Perjeta is marketed by Genentech Inc, a member of the Roche group

Full prescribing information can be obtained here, and the product website here.

Drug Side Effect Prediction and Validation

There's a paper just published in Nature getting a lot of coverage on the internet at the moment from Novartis/UCSF, and for good reason - but as the cartoon above states, it will probably have less impact than news on Justin Bieber's new haircut, or the latest handbags from Christian Lacroix. It uses the SEA target prediction method, trained using ChEMBL bioactivity data in order to predict new targets (and then by association side effects) for existing drugs. These are then experimentally tested, and the results confirmed in a number of cases - this experimental validation is clearly complex and expensive, so it is great news that in silico methods can start to generate realistic and testable hypotheses for adverse drug reactions (there are also positive side effects too, and these are pretty interesting to look for using these methods as well).

The use of SEA as the target prediction method was inevitable given the authors involved, but following up on some presentations at this springs National ACS meeting in San Diego. There would also seem to be clear benefits in including other methods into linking a compound to a target - nearest neighbour using simple Tanimoto measures, and naive Bayes/ECFPP type approaches. The advantage of the SEA approach is that it seems to generalise better (sorry I can't remember who gave the talk on this), and so probably can make more comprehensive/complete predictions, and be less tied to the training data (in this case ChEMBL) - however as databases grow, these predictions will get a lot better. There will also be big improvements possible if other data adopts the same basic data model as ChEMBL (or something like the services in OpenPHACTS), so methods can pool across different data sources, including proprietary in-house data.

There are probably papers being written right now about a tournament/consensus multi-method approach to target prediction using an ensemble of the above methods. (If such a paper uses random forests, and I get asked to review it, it will be carefully stored in /dev/null) ;)

So some things I think are useful improvements to this sort of approach.

1) Inclusion of the functional assays from ChEMBL in predictions (i.e. don't tie oneself to a simple molecular target assay). The big problem here though is that although pooling of target bioassay data is straightforward - pooling/clustering of functional data is not.
2) Where do you set affinity thresholds, and how do the affinities related to the pharmacodyamics of the side-effects. My view is that there will be some interesting analyses of ChEMBL that maybe, just maybe, allow one to address this issue. Remember, we know quite a lot about the exposure of the human body, to  a given drug at a given dose level...
3) Consideration of (active) metabolites. It's pretty straightforward now to predict structures of likely metabolites (not at a quantitative level though) and this may be useful in drugs that are extensively metabolised in vivo.

Anyway, finish off with some eye-candy, a picture from the paper (hopefully allowed under fair use!).

And here's a reference to the paper, in good old Bell AT&T labs refer format - Mendeley-Schmendeley as my mother used to say when I was a boy.

%T Large-scale prediction and testing of drug activity on side-effect targets
%A E. Lounkine
%A M.J. Keiser
%A S. Whitebread
%A D. Mikhailov
%A J. Hamon
%A J.L. Jenkins
%A P. Lavan
%A E. Weber
%A A.K. Doak
%A S. Côté
%A B.K. Shoichet
%A L. Urban
%J Nature
%D 2012
%O doi:10.1038/nature11159

Sunday, 10 June 2012

ChEMBL target links in wikipedia

Links to ChEMBL compounds from wikipedia have been there for some time, and now there is the target equivalent - for example here is the link to human thrombin.

Thursday, 7 June 2012

Assays, assays and a few more assays....

So, some more stuff on assays, in my quest to have something different to speak about over the summer; this post is about the tests a compound needs to pass through before it can become a drug. For a real test case, none of those green and red blobs I normally talk about, I took the excellent paper published in Science a few years ago - this is a great paper, discovering a clinical candidate for the treatment of malaria from a natural product screen. NITD609 is currently in phase 1 trials.

%T Spiroindolones, a Potent Compound Class for the Treatment of Malaria
%J Science 
%D 2010
%V 329
%P 1175-1180 
%O DOI: 10.1126/science.1193225
%A M. Rottmann 
%A C. McNamara
%A B.K.S. Yeung
%A M.C.S. Lee
%A B. Zou
%A B. Russell
%A P. Seitz
%A D.M. Plouffe
%A N.V. Dharia
%A J. Tan
%A S.B. Cohen
%A K.R. Spencer
%A G.E. González-Páez
%A S.B. Lakshminarayana
%A A. Goh
%A R. Suwanarusk
%A T. Jegla
%A E.K. Schmitt
%A H.-P. Beck
%A R. Brun
%A F. Nosten
%A L. Renia
%A V. Dartois
%A T.H. Keller
%A D.A. Fidock
%A E. A. Winzeler
%A T.T. Diagana

The great thing in this paper is that it gives a reasonably complete package of data in the supplementary data, and from this it's possible to assemble the series of assays used to go from an HTS screen to a clinical development compound. I've put these together in the diagram below - as a linear graph. It's interesting to see that the majority of distinct assay types are connected with ADMET properties as opposed to efficacy. To be clear, this graph is one possible cascade of assays, formulated as a linear string, in reality, not all assays are done on all compounds, and some assays are done in parallel - but I'd still argue its a useful way to think about the progress of a compound to a drug (especially when this formulation is done at scale across many targets/diseases.

Another key point is that the ADMET assays are generic, i.e. they apply to essentially all drug discovery programs, and so can be happily abstracted out and treated separately (maybe ;) ).

Here's a diagram - I know, I know, it looks like it was done by a small child. Oh, and it is all about red and green blobs after all! Ways of improving it would be to have the numbers of input and output compounds at relevant stages, and maybe splitting out the lead discovery, from the lead optimisation assays (but the paper isn't that clear on this). Click image to make it bigger.

The order/paralellism aspect only affects the ADMET assays, the order of the efficacy assays will be as presented; my guess is, based on the systems I've looked at so far (not too many), that in general the efficacy assays will be linearly deployed, which has some good computational properties.

Saturday, 2 June 2012

Papers citing the ChEMBL Database

Here is a link to a Google Scholar page detailing papers that cite the NAR Database paper of Gaulton et al. We'll try and assemble these into an archive (for the Open Access ones only of course) somewhere into our plans on the new ChEMBL interface.