ChEMBL Resources


Tuesday, 31 July 2012

Webinar: Accessing ChEMBL Web Services via Workflow Tools

As promised before, there will be a webinar on accessing ChEMBL REST web services via KNIME and Pipeline Pilot at 3.30 pm BST on Wednesday, 8th August 2012. Please email me in advance in order to register for this.

Friday, 27 July 2012

New Drug Approvals 2012 - Pt. XVI - Aclidinium bromide (TudorzaTM PressairTM)

ATC Code: R03BB05
Wikipedia: Aclidinium bromide

On July 23th, the FDA approved Aclidinum bromide (Tradename: Tudorza PressairTM; Research Codes: LAS-34273, LAS W-330), a muscarinic acetylcholine M3 receptor antagonist, for the long-term maintenance treatment of bronchospasm associated with chronic obstructive pulmonary disease (COPD).

Chronic obstructive pulmonary disease (COPD) is characterised by the occurrence of chronic bronchitis or emphysema, a pair of commonly co-existing diseases of the lungs in which the airways become narrowed. Bronchial spasms, a sudden constriction of the muscles in the walls of the bronchioles, occur frequently in COPD.

Aclidinum bromide is a long-acting antimuscarinic agent that through the inhibition of the muscarinic acetylcholine M3 receptors present in the airway smooth muscle, leads to bronchodilation, and consequently eases the symptoms of COPD.

The muscarinic acetylcholine M3 receptor (Uniprot: P20309, ChEMBL: CHEMBL245) belongs to the G-protein coupled receptor (GPCR) type 1 family, and binds the endogenous neurotransmitter acethylcoline. Since it is coupled to a Gq protein, its inhibition leads to a decrease of intracellular calcium levels, and consequently smooth muscle relaxation.

>ACM3_HUMAN Muscarinic acetylcholine receptor M3

There is one partially resolved 3D structure for this protein (2CSA), but there are now several relevant homologous structures of other closely related members of the family (see here for a current list of rhodopsin-like GPCR structures).

The -ium USAN/INN stem covers quaternary ammonium compounds. Members of these class include for example tiotropium bromide (ChEMBL ID: CHEMBL1182657), and ipratropium bromide (ChEMBL ID: CHEMBL1615433, which are also anthicholinergic drugs approved for the treatment of COPD.

Aclidinum bromide (IUPAC: [1-(3-phenoxypropyl)-1-azoniabicyclo[2.2.2]octan-3-yl]2-hydroxy-2,2-dithiophen-2-ylacetate bromide; Canonical smiles (for active quaternary amine): OC(C(=O)O[C@H]1C[N+]2(CCCOc3ccccc3)CCC1CC2)(c4cccs4)c5cccs5 ; PubChem: 11467166; Chemspider: 9609381; ChEMBLID: CHEMBL1194325; Standard InChI Key: ASMXXROZKSBQIH-VITNCHFBSA-N) is a synthetic quaternary ammonium compound with one chiral center, a molecular weight of 484.7 Da, 7 hydrogen bond acceptors, 1 hydrogen bond donor, and has an ALogP of 3.4. The compound is therefore fully rule-of-five compliant.

Aclidinum bromide is available as a dry powder inhaler and the recommended daily dose is two oral inhalations of 400 mcg. It has an apparent volume of distribution of 300 L following intravenous administration of 400 mcg, and its absolute bioavailability is approximately 6%. The estimated effective half-life of Aclidinum (t1/2) is 5 to 8 hours.

The major route of metabolism for aclidinum bromide is non-enzymatic and esterases-mediated hydrolysis, being rapidly and extensively converted to its alcohol and dithienylglycolic acid derivatives, neither of which binds to muscarinic receptors - this leads to very low systemic exposure of the active aclidinium species. Excretion of aclidinium bromide is mainly through the urine (54 - 65%) and faeces (20 - 30%), where only 1% is excreted as unchanged aclidinium. The total clearance is approximately 170 L/h after an intravenous dose of aclidinium bromide in young healthy volunteers.

The license holder for TudorzaTM PressairTM is Forest Pharmaceuticals, and the full prescribing information can be found here.

Tuesday, 24 July 2012

Some more ways to access ChEMBL

There's now a BioRuby API for ChEMBL, written by the excellent Mitsuteru Nakao (@32nm on twitter). Bio-chembl is a bioruby plugin built on top of the ChEMBL REST web services -

Of course, there is also the pychembl project in Python from Marcus Sitzmann of the NCI. -

And I've just been reminded of the R ChEMBL package from Rajarshi Guha 

However, the real mark of success will be when we get a ChEMBL f77 library!

Sunday, 22 July 2012

New Drug Approvals 2012 - Pt. XV - Carfilzomib (KyorolisTM )

ATC Code: L01XX
Wikipedia: Carfilzomib
On July 20th 2012 the FDA approved Carfilzomib (KyorolisTM) for the treatment of patients with multiple myeloma who have received prior therapies including bortezomib and an immunomodulatory therapy, and who have demonstrated disease progression within 60 days of completion of the previous therapy. Multiple myeloma is a cancer of the Plasma Cells and usually in the bone marrow. Five year survival is about 40% in the UK and and the USA.

Carfilzomib is a chirally defined modified tetrapeptidyl epoxide substrate analogue with a molecular weight of 719.9. The molecular formula is C40H57N5O7. (SMILES= O=C(N[C@H](C(=O)N[C@H](C(=O)N[C@H(C(=O)N[C@H](C(=O)[C@@]1(OC1)C)CC(C)C)Cc2ccccc2)CC(C)C)CCc3ccccc3)CN4CCOCC4)

Cmax and AUC following a single intravenous dose of 27 mg/m2 was 4,232 ng/mL and 379, respectively. The mean steady-state volume of distribution of a 20 mg/m2 dose of carfilzomib was 28 L. Carfilzomib was rapidly and extensively metabolised, via peptidases and epoxide hydrolyses - i.e. non-CYP mediated metabolism. After intravenous administration of ≥ 15 mg/m2, carfilzomib was rapidly cleared from the systemic circulation with a half-life of ≤ 1 hour.

Carfilzomib is a second generation, non-competitive, irreversible proteasome inhibitor. It contains the unusual (for drugs) epoxide group responsible for the irreversible binding to the target. It is differentiated from the first generation proteasome inhibitor bortezomib (boron-based) in this irreversible inhibition mechanism, which is believed to contribute to overcoming bortezomib resistance. Additionally, clinical trials have showed that carfilzomib is associated with fewer incidences of Peripheral Neuropathy (NP) (incidence reported 14% of patients with 1% having Grade 3 NP) in comparison with bortezomib (36% and 24% Grade 3). Epoxides are quite reactive, and can react with many proteins in a biological system.

The molecular target for carfilzomib is the 20S catalytic core of the large multi-protein complex - the proteasome. Specifically, it binds to the N-terminal threonine-containing active site of the endopeptidase proteasome subunit beta type-7 (Uniprot: Q99436).

The amino acid sequence is:

>sp|Q99436|PSB7_HUMAN Proteasome subunit beta type-7 OS=Homo sapiens GN=PSMB7 PE=1 SV=1

Prescribing information had be found here.

The licence holder is Onyx Pharmaceuticals and the product website is

Allosterism, Allosterically Regulated Targets, and Drug Discovery

When a screen and follow-up have failed to deliver an attractive lead for a 'hard' target, talk amongst the team often turns to running a screen for an allosteric inhibitor; Allosteric regulators are therefore a seductive/tantalising approach to find leads against tough targets such as protein-protein interactions, highly polar binding sites, etc.

But, there's a few hard questions to answer, this is just a few.

  • Just how do I run a screen for an allosteric inhibitor?
  • What compounds do I select for allosteric sites?
  • How likely am I to find an allosteric inhibitor? 
  • What sorts of targets can be allosterically modulated?
  • What are the 3-D properties of allosteric sites

I've read some of the literature behind the arguments for allosteric inhibitors, and I must say I'm not that convinced as to their general potential benefits at the moment - of course in some cases, they will be perfect, but are the advantages as general as people may think. But I've been wrong, so many times in my life by now, that it's always worth looking at the actual data. So, as part of the SMS-Drug project we're putting together (and here 'we' means Gerard) of an overview of allosteric binders within ChEMBL - hopefully leading to some annotation of the class of binding of compounds to their targets.

Here (below, click for larger), is an initial overview of the target classes from ChEMBL for which we find evidence of allosteric regulation - this is not the same as the a priori distribution of targets, and so it looks interesting.....

Hopefully, the full story of this analysis will appear in printed form reasonably soon.

Wednesday, 18 July 2012

ChEMBL 14 Released

We are pleased to announce the release of ChEMBL_14. This latest version of the ChEMBL database contains:
  • 1,384,479 compound records
  • 1,213,242 distinct compounds
  • 644,734 assays
  • 10,129,256 bioactivities
  • 9,003 targets
  • 46,133 documents
  • 10 data sources
As well as updates to the scientific literature and PubChem data sources, this release also includes data from 2 new sources:
  • DrugMatrix - in vitro pharmacology assays for 870 therapeutic, industrial and environmental chemicals against 132 protein targets.
  •  GSK Published Kinase Inhibitor Set - two data sets screening this compound library have been deposited by Nanosyn and the University of North Carolina.
On the interface, we have also added some new compound cross references to Gene Expression Atlas, Drugs of the Future (subset of PubChem), IUPHAR, NIH Clinical Collection and ZINC. On the target report card pages we have added cross references to CanSAR, Gene Ontology, IntAct, InterPro, IUPHAR, MICAD, Reactome and Wikipedia.

You download the ChEMBL_14 data from our ftpsite, but please refer to the chembl_14 release notes for a full list updates, changes and also details on planned schema changes in forthcoming ChEMBL releases.

Friday, 13 July 2012

ChEMBL delivers a new malaria data service

As announced also here, a new malaria data service is available today to researchers around the globe sponsored by the Medicines for Malaria Venture (MMV).

The service provides access to hundreds of thousands of data points on malaria-related compounds, assays and targets, thus facilitating research for this neglected tropical disease. Inspired by the successful ChEMBL interface, a user may query the database using keywords, synonyms, chemical structures or protein sequences, review and filter the hits using tables or charts and then download the resulting subset. Data provenance is also provided so that the user can filter on the data sources they are interested in, such as scientific literature, the GlaxoSmithKline TCAMS, Novartis GNF and St Jude datasets or the MMV open access Malaria Box.

Based on the ChEMBL update cycle, the malaria data database will be regularly updated with depositions by academic and industrial groups who wish to share their malaria screening data with the rest of the research community. More depositions are currently being processed and will be available shortly.

Thursday, 12 July 2012

Carbon and Nitrogen and Oxygen

There was a post a few weeks ago about elemental compositions - simple cases of Carbon oxides, nitrogen oxides, etc. Well, I've processed ChEMBL to do the analysis across all of the compounds - very simple analysis - select all formulas of all compounds, select only those that are contain only elements restricted to the set CHNO, and then plot the heavy element (i.e. C N or O) fractions on a ternary plot.

Here it is (click image for larger).
So, the compounds that chemists make are mostly carbon-based, and of course, there's no contours on this version of the plot - hardly a big surprise, that's why they are organic, but things get more interesting when you think about using these data as filters/heuristics for expectation values for the sort of things that chemists could make, for spotting unusual compounds, etc. etc. More later on this...

Update: Here is the plot for drugs. As you will see there are some differences...

Tuesday, 10 July 2012

Paper - Combinatorial Drug Therapy For Cancer In The Post-genomic Era

There's a great paper just out from some of our collaborators - Bissan Al-Lazikani, Udai Banerji and Paul Workman, of the ICR. It reviews drug combination strategies for cancer, and some of the molecular features of effective drug combinations.

A link to the paper is here.

%A B. Al-Lazikani
%A U. Banerji
%A P. Workman
%T Cobminatorial Drug Therapy For Cancer in the Post-genomic Era
%J Nat. Biotech.
%V 30
%P 1-13
%D 2012

Compound Clean Up and Mapping (Posted by Louisa)

This new blog post has been created due to popular demand and user requests. I hope that this is useful for you.

After being manually extracted from the primary literature, a compound can be only loaded into the ChEMBL database after it has been run through our in-house clean-up protocol. This protocol utilises Accelrys's Pipeline Pilot software and has evolved a lot over the past three years. The clean-up protocol is used to prevent any structures from being loaded that could be incorrect, not properly charged or contain bad valences. We also use it to map the structures to already existing compounds in ChEMBL.

Historically, the clean-up protocol was very simple with just a few components to squeeze out any unwanted structures. Initially, we were mostly concerned about having uneven charges (e.g. charged counter ion but neutral parent) or quaternary nitrogen-containing compounds without a counter ion at all. Over the past 14 releases, the clean-up has become more sophisticated and now takes into account steroid backbone stereochemistry, inorganic salts and bad valences, amongst other things. A lot of the additions and adjustments have come from stumbling across little subsets of compounds that we hadn't thought of looking for before, and then warranted a consistent cleanup. This work has also led to the development of a series of business rules applied to consistently represent a functional group (for example, nitro groups).

Once the new compounds have been cleaned up, they need to be checked to see if they already exist in the database. Initially, this was done by mapping to the standard InChI,  but it was soon realised that not all papers display the correct structure, if any structure at all, or it may not be extracted from the publication exactly as shown. This was causing duplicate compounds to be loaded into ChEMBL. Therefore, it was decided that a better initial mapping would be to use the extracted compound name and compare it to the many stored compound names in the database. This reduced a lot the duplicate entries. Once the new compounds have been mapped on the name, the remaining compounds are then mapped on their standard InChI. For those that don't match either a name or an InChI, I create a text file of their names and cast an eye over them to see if there are any that can still be mapped. I have been able to catch a couple of odd ones now and again via this last check, so it's definitely worth doing.

The compound clean-up is always a 'work in progress' and open to new suggestions for filtering out any compound or group of compounds that could do with further checking. If anyone would like to know more about the clean-up protocol or to send us some suggestions, please email

Conference: New Horizons in Toxicity Prediction, September 2012, Cambridge, UK

A great looking conference on toxicity prediction to be held at Downing College, Cambridge on September 5th and 6th 2012 - a fantastic line up of speakers, and organised by one of our favourite collaborators LHASA Limited.

Details of the conference, registration, etc. are here.

Monday, 9 July 2012

WT Course - Computational Approaches to Drug Discovery

So, the course is over for another year. It was really good fun to do, and thanks to all the attendees who made it really rewarding for us all. There is plenty for us to think about in what you asked, and we'll try and include a few more things in the ChEMBL interface. An especial thanks goes out to our visiting lecturers - John Irwin, Noel O'Boyle, Markus Sitzmann, Val Gillet, Andreas Bender, Darren Green, Bissan Al-Lazikani and Mike Barnes.

The picture is of (most of) the course attendees and the faculty on the last day, it was raining, much like every day that week, so an indoor photo. It looks cool.

Till next year!!

EIPOD postdocs 2012

Just a reminder of the opening of applications for the EMBL postdoc scheme - EIPOD for 2012. Get in touch if you're interested in our draft project.

Friday, 6 July 2012

Public Popular Chemistry Databases and Licensing

Licensing of data, and copyright is a complex thing, and always gets people hot under the collar! Some time ago, following consultation with our funders, we settled on a CC-BY-SA license for ChEMBL - this does a couple of things, but primarily it places an explicit license on the data so it is clear what you can do with it. There is a lot of hot air and active discussion over how 'public' and 'open' particular licenses are, but the CC-BY-SA 3.0 license made sense to us (for reference, this license is also used by the world's premier open resource wikipedia - here is their license)

This license we apply to each release of the database, it makes the data freely available and usable. It requires attribution, so that users of derivative works know where the data comes from, can identify the funders and producers of the work - which we think is fair and appropriate, and finally it applies 'share-alike', so that if you distribute the ChEMBL data further you shouldn't restrict the rights of your users to further give away, use, remix, etc. the data. To be clear though, it does not preclude commercial use of the data.

The Share Alike clause is the thing that people usually get excited by, but this is meant to ensure that if you distribute the data to others you make sure that you also give those you distribute it to the same right to redistribute. If you have a significant problem with this, then don't redistribute any data you get under a SA license; if you find it difficult to keep track of the provenance of data entering your systems, should you really be building stuff to distribute anyway? ;)

Chemistry is a relatively odd world compared to bioinformatics, in that users are generally worried about inadvertent disclosure of their ideas and queries - the basis of the concern is that running a search over the Internet, over an open network, can amount to disclosure and 'publication', and could in theory void a patent through prior art disclosure. As you are probably aware, it is easy to monitor and record traffic over public networks, recovering passwords, etc. Secondly, there is a general suspicion over what happens to recording usage on the servers - do the providers of the web service mine the queries? sell them on? and other sort of paranoid stuff. Well it is not as paranoid as you may think, maybe, since several large internet sites explicitly state in the Terms and Conditions that your query becomes their intellectual property. Sloppy programming, in particular with advertiser sites, can disclose a whole bunch of query data to advertisers.

One of the ways of dealing with the former issue, is to provide access over the https: protocol, this encrypts the traffic between your client browser and the server, and also prevents impersonation of the server by another machine (to most reasonable intents and purposes this is still true). The same secure http: protocol can be applied to make programmatic web services secure too.

There are a number of large 'free'/Open chemical databases at the moment, and we drew up a little table the other day comparing the license and access. It's not complete, probably contains some errors, so if anything is wrong, please let me know. If there are other Open resources to add, put something in the comments, and I'll add it to the table.

Update - thanks to Richard of the RSC, I've corrected some ambiguities in the original table.

ResourceUrl (http: protocol form)LicenseDownloadhttps:


PubChem clear license statementyesyes

ZINC redistribution of significant subsets without permissionyesyes, but certificate expired

ChemSpider download, and limited to a total 5,000 data entries stored locally. API access free to academic users, others by agreement.noyes, but broken

Postdoc project in in silico/biochemical target prediction

We have an interdisciplinary postdoc project available as part of EMBL's EIPOD program (details here). The project with is Matthias Willmanns based at EMBL Hamburg, and the appointee will spend time in both labs in a combined computational and experimental project aimed at discovering the mode of action of high-throughput screening hits from an anti-tuberculosis assay.

Further details of the project are available here. This is deliberately brief, and candidates are meant to flesh out the project design as part of the application process.

The deadline for applications is 5pm CEST 13th September 2012.

Thursday, 5 July 2012

New Drug Approvals 2012 - Pt. XIV - Mirabegron (MyrbetriqTM)

ATC Code: G04BD (incomplete)
Wikipedia: Mirabegron

On June 28 2012, the FDA approved Mirabegron (tradename: Myrbetriq; Research Code: YM-178), a novel, first-in-class selective β3-adrenergic receptor agonist indicated for the treatment of overactive bladder (OAB) with symptoms of urge urinary incontinence, urgency, and urinary frequency. OAB syndrome is a urological condiction defined as urinary urgency, usually accompanied by frequency and nocturia, with or without urge urinary incontinence, in the absence of urinary tract infection or other obvious pathology. Mirabegron acts by relaxing the detrusor smooth muscle during the storage phase of the urinary bladder fill-void cycle by activation of β3-receptor which in turn increases bladder capacity.

Other treatments for OAB are already in the market and these include treatments with antimuscarinic drugs, such as Flavoxate (approved in 1970; tradename: Urispas; ChEMBL: CHEMBL1493), Oxybutynin (approved in 1975, tradenames: Ditropan, Ditropan XL, Oxytrol, Gelnique, Anturol; ChEMBL: CHEMBL1231), Tolterodine (approved in 1998; tradenames: Detrol, Detrol LA; ChEMBL: CHEMBL1382), Trospium (approved in 2004; tradenames: Santura, Santura XR; ChEMBL: CHEMBL1201344), Solifenacin (approved in 2004; tradenames: Vesicare; ChEMBL: CHEMBL1200803), Darifenacin (approved in 2004; tradenames: Enablex; ChEMBL: CHEMBL1346) and Fesoterodine (approved in 2008; tradenames: Toviaz; ChEMBL: CHEMBL1201764). While these drugs act by inhibiting the muscarinic action of acethylcholine, Mirabegron represents the first β3-receptor agonist to ever reach the market.

β3-receptor (ChEMBL: CHEMBL246; Uniprot: P13945) is a 408 amino-acid long G protein-coupled receptor (GPCR), belonging to Rhodopsin family (PFAM: PF00001; subfamily A17). Crystal structures of the closely related β1- and β2-receptors are known and act as good frameworks for understanding the mode of action of Mirabegron.

>ADRB3_HUMAN Beta-3 adrenergic receptor

Mirabegron is a synthetic chiral small-molecule, with a molecular weight of 396.51 Da, a AlogP of 2.26, 4 hydrogen bond donors and 5 hydrogen bond acceptors, and thus fully rule-of-five compliant. (IUPAC: 2-(2-amino-1,3-thiazol-4-yl)-N-[4-[2-[[(2R)-2-hydroxy-2-phenylethyl]amino]ethyl]phenyl]acetamide; Canonical Smiles: C1=CC=C(C=C1)[C@H](CNCCC2=CC=C(C=C2)NC(=O)CC3=CSC(=N3)N)O; InChI: InChI=1S/C21H24N4O2S/c22-21-25-18(14-28-21)12-20(27)24-17-8-6-15(7-9-17)10-11-23-13-19(26)16-4-2-1-3-5-16/h1-9,14,19,23,26H,10-13H2,(H2,22,25)(H,24,27)/t19-/m0/s1)

The recommended starting dosage of Mirabegron is 25 mg once daily, with or without food, and is effective for 8 weeks. Depending individual patient efficacy and tolerability, the dose may be increased to 50 mg once daily.

Mirabegron has a bioavalibity of 29% at a dose of 25 mg, which increases to 35% at a dose of 50 mg, a volume of distribution (Vd) of approximately 1670 L and a moderate plasma protein binding of ca. 71%. Mirabegron is metabolized via multiple pathways involving dealkylation, oxidation, glucuronidation and amide hydrolyis. Studies have suggested that although CYP3A4 and CYP2D6 isoenzymes play a role in the oxidative metabolism of Mirabegron, this is a limited role in the overall elimination. In addition to these isoenzymes, the metabolism of Mirabegron may also involve butylcholinesterase, uridine diphospho-glucuronosyltransferases and alcohol dehydrogenase. Two major inactive metabolites were observed in human plasma and these represent 16% and 11% of the total exposure. Mirabegron total clearance (CLtot) from plasma is ca. 57 L/h, with a terminal half-life of approximately of 50 hours. Renal clearance (CLR) is approximately 13 L/h, which corresponds to nearly 25% of CLtot. The urinary elimination of unchanged Mirabegron is dose-dependent and ranges from ca. 6% after a daily dose of 25 mg to 12.2% after a daily dose of 100 mg.

The license holder is Astellas Pharma Inc. and the full prescribing information of Mirabegron can be found here.

ChEMBL Is Alive! Part 1 - posted by Louisa

'ChEMBL Is Alive' is to show that ChEMBLdb is a living database that is constantly being worked on by a number of people. As the Chemical Curator for ChEMBL, I (Louisa Bellis) thought it would be interesting for our Blog readers to find out what goes on behind the scenes at 'ChEMBL Towers' and to get regular updates on what we are doing to the data between releases and in response to user emails sent to

As well as being the chemical curator, I also deal with most of the help-desk traffic, where users can email in and let us know of any errors that they may have found, or even to suggest an improvement or enhancement for the interface.

As an example of the work that is done to ChEMBL on an ongoing basis, I thought it would be good to give a brief summary of some of the chemical curation that occurred during the month of June 2012:

An external user pointed out to me that they had come across a 'few' compounds that had the same canonical SMILES string, but had different standard InChI strings. I created a spreadsheet of these duplicate SMILES, which came to a whopping 967 lines. Of these, just over 100 lines were due to E/Z isomerism, some needed to be merged for being incorrect and the rest were checked individually to see why the SMILES were the same. It turned out that there was an issue with the molfiles so each of these compounds was redrawn from scratch. This came to 1,112 compound redraws in all which will be loaded into ChEMBL as soon as possible and will be visible to external users in the ChEMBL_15 release (expected end of November 2012).

I also started working on a list of duplicate names in the ChEMBL database. This was to support my own work flow and not suggested by our users - it created a list of 9,952 duplicate names. However, not all duplicate names are actual duplicates that need to be merged together, they can simply have the same simple chemical name that is not reflecting that they are enantiomers of each other. This work is still ongoing, but I have been able to redraw and merge about 100 compounds as a direct result of this list. I am only about 10% of the way through this spreadsheet, so I can say that it will keep me busy for a little while yet.

In June, we also received two emails from users to let us know that they had found what they believed were errors. In one case, the units had been incorrectly extracted from the paper as nM, when they were in fact uM. Upon checking the paper, I could see where the confusion had arisen. I could see that it had one table where they displayed uM and all the rest of the tables were nM, so the extractor had not seen this difference. These have now been fixed and will be visible in ChEMBL_14 (due for release end of July 2012).

The other email I'll mention here was to do with target assignment, where we had assigned a target to some data, and the user had read the paper and believed that the data was incorrectly assigned. This is still being checked by our biological curator, but if found to be incorrect, will be changed immediately in the database.

These are both great examples of users helping us to improve the quality of data in ChEMBL.

I hope to add more curation information in the future, but if there is anything specific that you would like to see me blog about (relating to curation or error checking) then please let me know.

Tuesday, 3 July 2012

Access to ChEMBL web services via workflow tools

As some of you may know, besides the ChEMBL web interface and the SQL dumps, there is a another way to access and retrieve data from your favourite public database, namely the RESTful web services. We have already provided API examples there using Java, Python and Perl but, as of today, we also provide examples for popular pipelining / workflow tools, such as Pipeline Pilot and KNIME.

The user base of such tools is certainly growing, as they offer modularity, flexibility, transparency and higher integration and sharing capabilities compared to programming or standard software packages. In fact, demand for tighter integration between ChEMBL and these tools was one of the outcomes of our recent Workflows workshop. Indeed, using the web services via a workflow tool is a seamless way to search, retrieve, integrate and analyse data without having to install, maintain and update local databases or write, dreaded for some, SQL queries.

We have posted some useful examples on the Pipeline Pilot and KNIME community fora, which are available to download.

If there is interest expressed in the comments below, we can also provide a webinar, say on Tuesday 31 July 2012 at 3pm.

Seminar: Discovery of Viagra/Revatio (aka sildenafil aka UK-92480)

A reminder of an on campus, open, seminar from Andy Bell (now at Imperial College), detailing the discovery and development of UK-92,480 (also known as sildenafil and even better known as V1agra and R3vat10). Andy was one of the medicinal chemists and inventors on the PDE-5 inhibitor programme at Pfizer, and the story covers many aspects of drug discovery including, of course, the discovery of the side effect, and also one where the pharmacology led to many new molecular insights into NO signalling and PDE biology.

There are many myths about the discovery of V1agra, so this is a rare opportunity to hear the exciting story first-hand.

Here's Andy's abstract....

"Viagra™ (sildenafil) is a unique example of a chemical tool being used to
discover the linkage between a biological mechanism and a disease through
clinical trials. The presentation will describe the discovery of sildenafil
and its use in defining the role of cGMP phosphodiesterases (particularly
PDE5) in human diseases such as Male Erectile Dysfunction (MED) and Pulmonary
Arterial Hypertension (PAH). These clinical studies, combined with the
discovery of additional PDE isoforms, were used to define a desirable profile
for subsequent 2nd generation PDE5 inhibitors. The impact of structural
biology and high throughput screening on the discovery of further clinical
candidates will also be discussed."

The seminar  is on Tuesday July 17th 2012 at 2pm in room M203 (room change alert for those who put it in their diary earlier) - you will need to mail me in order to get registered with campus security if you don't work on campus. If you do, you can just turn up.

Andy will also be giving a more detailed technical seminar on screening file analysis, diversity, chemical space, etc - again, let me know if you are interested in attending this too......