ChEMBL Resources


Sunday, 29 September 2013

Congratulations to Ben Stauch PhD!

Ben Stauch in the group has just been examined on his these - 'Methods for the Investigation of Protein-Ligand Complexes'. This was a tour de force of many techniques - NMR, computational and X-ray crystallography. Ben will be around for a few more months, writing things up, and completing/starting some experimental work on Xe complex refinement and characterisation.

Congratulations to Ben from all the group!

In due course, the thesis will be downloadable from the EBI and EMBL websites, and I'll update this post when the files are there.


Friday, 27 September 2013

Team ChEMBL in Action

We usually blog about exciting scientific and technological updates, interesting concepts, ideas and publications within the realm of life sciences and drug discovery. 

This post is slightly different, as it deals with something that might be (even) more important:  

A number of us in the ChEMBL Group (Rita (not in the picture above), Patricia, Felix, Anna, Sam, Anne, Mark, Michal, George, Gerard and Ashwini) are doing a Fun Run at Victoria Park on 12th October to help raise money for Cancer Research UK.  We are doing this to support a colleague who is currently receiving treatment for cancer.

We've set up a JustGiving page which makes donations fast, easy and secure.

Anything you can donate (in almost any currency :)) to this worthwhile cause would be really much appreciated.

The ChEMBL Group

Thursday, 26 September 2013

Document Similarity in ChEMBL - 2

Following up on yesterdays post by George and Mark, I put together a slide, hopefully illustrating the advantages of document comparison using objects other than words alone.


Wednesday, 25 September 2013

Document Similarity in ChEMBL - 1

Many of you will have noticed a new section on the ChEMBL interface, specifically at the Document Report Card page, called Related Documents. It consists of a table listing the links for up to 5 other ChEMBL documents (i.e. publications aka papers) that are scored to be the most similar to the one featured in the report card. Here's an example

How does this work? There are examples of related documents sections online, e.g. in PubMed or in various journal publishers' websites. Document 'related-ness' or similarity can be assessed by comparing MeSH keywords or by clustering documents using TF-IDF weighted term vectors. Fortunately, ChEMBL puts a lot of effort in manually extracting and curating the compounds and biological targets from publications, so why not using these as descriptors to assess document similarity instead - as far as we know this is the first time this approach has been implemented?

So, here's how it works:

Firstly, for each document in ChEMBL, its list of references is retrieved using the excellent EuropePMC web services. By considering documents as nodes which are connected with an edge if one paper cites the other, a directed graph structure emerges. By doing this for all ~50K documents in ChEMBL, you get the massive graph illustrated above in Cytoscape. As a bonus, by measuring the in- and out- degree of the nodes, one could check which are the most cited papers in ChEMBL - but that's the topic of another blog post. This graph could be further annotated with protein target families, authors and institutions, as it has been elegantly done here.

Moving on, once a relationship between two documents is established, we need a way to quantify their similarity. As hinted above, we used the normalised overlap of compounds and targets reported in the two documents. This is done using the classic Tanimoto coefficient, so if doc A reports compounds (1,2,3) and doc B reports compounds (3,4,5), their compound Tanimoto similarity T is 1/5 or 0.2. Exactly the same applies for the target-based document similarity. The composite score we use to rank docs in the Related Documents section is simply the maximum of the two individual ones.

What does all that mean in practice? It means that 2 papers are listed as similar if they their reported compounds or biological targets overlap significantly (and one cites the other). For example, papers with follow-up experiments on the same candidate drug will be deemed similar, e.g. this one. The same will apply to two papers that involve kinase panel screening assays. A desirable side-effect is that by following the links, the tenacious user may traverse the whole graph displayed above! 

George & Mark 

Tuesday, 24 September 2013

Paper: Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets

A paper from Gerard in the group on some of his proteochemometric modelling work; a link to the paper is here. Z-scales rule! (the original Sandberg et al J Med Chem paper on the Z-scales was one of my 'lightbulb turning on' moments in my professional life - go hunt it down if you don't know it.)

%T Benchmarking of protein descriptor sets in proteochemometric modeling (part 2: modeling performance of 13 amino acid descriptor sets
%A G.J.P. van Westen
%A R.F. Swier
%A I. Cortes-Ciriano
%A J.K. Wegner
%A J.P Overington
%A A.P. IJzerman
%A H.W.T. van Vlijmen
%A A. Bender
%J J. Cheminformatics 
%D 2013
%V 5
%O doi:10.1186/1758-2946-5-42


Saturday, 21 September 2013

New Drug Approvals 2013 - Pt. XIII - Dolutegravir (TivicayTM)

ATC code: J05AX12

On 12 August, the FDA approved a further drug for the treatment of HIV-1 infection, Dolutegravir (Tradename: Tivicay). Dolutegravir also known as S/GSK-1349575, is an HIV-1 integrase inhibitor. The drug has been approved for treatment of treatment-naïve as well as treatment-experienced HIV-infected adults including those who have been treated with other integrase inhibitors. In addition, Dolutegravir can be used for the treatment of children aged 12 years or older and weighing at least 40kg who have not been treated with integrase inhibitors, but are either treatment-naïve or treatment –experienced.

HIV, a lentivirus, infects vital cells in the human immune system such as helper T. cells (CD4+ T cells) and macrophages. The disease is responsible for millions of death every year, especially in Sub-Saharan Africa where treatment complications are enhanced by co-infection with tuberculosis and poverty. The approval of a new antiviral agent like Dolutegravir, will enhance treatment of the disease and improve the quality of people’s lives.

Dolutegravir is an inhibitor of HIV-1 integrase responsible for the insertion of the viral DNA into the host chromosomal DNA. The drug interferes with replication of HIV by preventing the viral DNA from assimilating into the genetic material of the human T cells. An example of a 3D structure of the enzyme’s core domain (PDBe: 3vqa) is shown below.

HIV-1 integrase (ChEMBLID: CHEMBL3471, UniProt Accession: Q72498)  is an attractive target for drug design. It is one of three enzymes of HIV (others are Reverse Transcriptase and the Protease) that consists of three main domains with specific functions. The N-terminal domain characterized by the His2Cys2 motif chelates zinc, the core domain consists of the catalytic DDE motif important for the activity of the enzyme, and the C-terminal domain, with an SH3-like fold, that binds DNA nonspecifically. There are a variety of crystal structures of the different domains of HIV-1 integrase reported in PDBe (Protein Data Bank in Europe)

Dolutegravir , ChEMBLID: CHEMBL1229211 (C20H19F2N3O5, IUPAC Name: (4R,12aS)-N-[(2,4-difluorophenyl)methyl]-7-hydroxy-4-methyl-6,8-dioxo-3,4,12,12a-tetrahydro-2H-pyrido[5,6]pyrazino[2,6-b][1,3]oxazine-9-carboxamide, Canonical smiles: CC1CCOC2N1C(=O)C3=C(C(=O)C(=CN3C2)C(=O)NCC4=C(C=C(C=C4)F)F)O) has two chiral centers, molecular weight of 419.12, 2 hydrogen bond donors, 6 hydrogen bond acceptors, 3 rotatable bonds, Polar surface area of 99.18 and alogP of 0.3. Dolutergravir is orally administered since it does not violate Lipinsik’s ‘Rule of Five’. The drug may be taken with or without food. For treatment-naïve or treatment-experienced with integrase transfer inhibitor (INSTI) – naïve adults and children the recommended dose is 50mg once. A dose of 50mg twice daily is recommended when dolutegravir is co-administered with potent UGT1A/CYP3A inducers like efavirenz, fosamprenavir/ritonavir, Tipranavir/ritonavir or rifampin.

The license holder for Dolutegravir is ViiV Healthcare, an HIV joint venture between GSK, Pfizer Inc and Shionogi. The full prescribing information can be found here.

Thursday, 19 September 2013

Resources for Computational Drug Discovery - Wellcome Trust Course DEADLINE APPROACHING!

It's that time of year again when the ChEMBL team and their collaborators come together to host the "Resources for Computational Drug Discovery" course.

This course has been highly successful and well received over the past 3 years, and this year has no plans to be any different. It will be held here at the EBI campus in Hinxton, Cambridgeshire from the 9th - 13th December 2013.

We will have speakers and instructors from institutes such as the University of California San Francisco, Institute of Cancer Research and University of Sheffield. The course will have both theoretical and practical sessions where the attendees will have a chance to apply what they have just learned.

A provisional program can be found here.

The deadline to sign up is looming (8th October) so click here to register and avoid disappointment!


New Drug Approvals 2013 - Pt. XIV - Tecfidera™

ATC Code: N07XX09 (2014)
Wikipedia: Dimethyl Fumerate

On March 27th the FDA approved Dimethyl Fumarate (DMF, trade name TECFIDERA™) for the treatment of adults with relapsing forms of multiple sclerosis (MS). DMF and the metabolite, monomethyl fumerate (MMF), activate the Nuclear factor (erythroid-derived 2)-like 2 (Nrf2) pathway via inhibition of Kelch-like ECH-associated protein 1 (KEAP1, cytosolic inhibitor of Nrf2). 

The KEAP1 (CHEMBL2069156) is a naturally occuring cytosolic inhibitor of Nrf2 and DMF/MMF acts through chemical modification of KEAP1.

The NrF2 pathway is the primary cellular defence against the cytotoxic effects of oxidative stress. After translocation to the nucleus, Nrf2 heterodimerizes with MafF, MafG, and MafK. The combined heterodimer binds to antioxidant/electrophile response element (ARE/EpRE) and subsequently initiates transcription of these genes.

KEAP1 acts as the cytosolic anchor of Nrf2, sequestering Nrf2 in the cytoplasm during basal conditions. In addition KEAP1 contains a nuclear export signal and it is hypothesised to be the primary redox sensor. Thus DMF mediated inhibition of KEAP, leads to an increase of NrF2 translocation and increase in transcription of ARE/EpRE. This is hypothesized to be the working mechanism of DMF/MMF in MS. In addition, MMF has been shown to be an agonist of the nicotinic acid receptor (CHEMBL3785).
Dimethyl Fumarate (CHEMBL2107333 ; Chemspider : 553171;  Pubchem : 99431554 ) is a small molecule drug with a molecular weight of 144.1 Da, an AlogP of 0.49 , 4 rotatable bonds and does not violate the rule of 5.

Canonical SMILES : COC(=O)\C=C\C(=O)OC
InChi: InChI=1S/C6H8O4/c1-9-5(7)3-4-6(8)10-2/h3-4H,1-2H3/b4-3+

The recommended starting dose of TECFIDERA is 120 mg twice daily, for 7 days. Subsequently the dosage should be increased to a 240 mg twice daily maintenance dose. Tecfidera can be taken with or without food.

In humans, dimethyl fumarate is extensively metabolized by esterases, which are ubiquitous in the gastrointestinal tract, blood, and tissues, before it reaches the systemic circulation. Further metabolism of MMF occurs through the tricarboxylic acid (TCA) cycle, with no involvement of the cytochrome P450 (CYP) system. MMF, fumaric and citric acid, and glucose are the major metabolites in plasma.

Exhalation of CO2 is the primary route of elimination, accounting for approximately 60% of the TECFIDERA dose. Renal and fecal elimination are minor routes of elimination, accounting for 16% and 1% of the dose respectively. Trace amounts of unchanged MMF were present in urine.

The terminal half-life of MMF is approximately 1 hour and no circulating MMF is present at 24 hours in the majority of individuals. Accumulation of MMF does not occur with repeated dosing.

The license holder is Biogen Idec. the full prescribing information can be found here.

Monday, 16 September 2013

ChEMBL_17 Released

We are pleased to announce the release of ChEMBL_17. This version of the database, prepared on 29th August 2013 contains:

  • 1,519,640 compound records
  • 1,324,941 compounds (of which 1,318,187 have mol files)
  • 12,077,491 activities
  • 734,201 assays
  • 9,356 targets
  • 51,277 documents

You can download the data from the ChEMBL FTP site. For more information please read the release notes.

Data changes since the last release:

Drug mechanism of action

For all FDA-approved drugs, information regarding the mechanism of action and associated efficacy targets has been curated from primary sources, such as literature and drug prescribing information. Targets have only been included for a drug if a) the drug is believed to interact directly with the target and b) there is evidence that this interaction contributes towards the efficacy of that drug in the indication(s) for which it is approved.

Metal-containing compounds

Structures for around 3200 metal-containing compounds have been removed from the database (though the bioactivity and other information for these compounds is retained). For more information, please see the previous blog posts:

New data sets

Several new deposited/extracted data sets have also been included in the latest release: two deposited data sets from GlaxoSmithKine for Ghrelin receptor agonists and Motilin receptor agonists, a data set of the results of screening the MMV Malaria Box compound collection for activity against Schistosoma mansoni, two data sets screening the GSK PKIS compound collection for inhibition of luciferase activity, and finally pathology data from the Open TG-GATES project.

Interface changes since the last release:

Browse Drug Targets tab

A new tab has been created to show the new mechanism of action information for FDA approved drugs together with the references from which the information was obtained, and links to the relevant drug/target report card pages.

Document Report Card

A new table has been added to the document report card, showing other ChEMBL documents that are related to the current document. Pair-wise document similarity is assessed by two components. The first component is defined by whether a document cites or is referenced by the other. The second component is defined by the amount of overlap between the compounds and biological targets reported in the two respective documents. This overlap is quantified by the Tanimoto coefficient. Documents with the highest Tanimoto similarity scores to the query document are listed in this section. For example, the following page shows 5 additional ChEMBL documents that are deemed similar to the paper currently being viewed.

Database changes since the last release:

A number of new tables have been added to store the drug mechanism of action information (please see release notes and schema documentation for full details). In addition, a number of minor changes have been made to existing tables:

The PROTEIN_FAMILY_CLASSIFICATION table has been deprecated and replaced by a new hierarchical version: PROTEIN_CLASSIFICATION.

The MOLREGNO field has been removed from the ATC_CLASSIFICATION table and moved to a new mapping table: MOLECULE_ATC_CLASSIFICATION.

The MOLFORMULA field has been moved from the COMPOUND_STRUCTURES table to the COMPOUND_PROPERTIES table (and renamed).

The ChEMBL Team