ChEMBL Resources


Wednesday, 27 March 2013

Meeting up at the ACS in New Orleans

A couple of us are out in New Orleans for the ACS in a few weeks - if any ChEMBL users would like to meet up to get some training, ask questions, suggest improvements to what we do, or just grab a beer or coffee, we'd be very happy to do so. We arrive Sunday and leave Tuesday night.

The picture is rather amusing, and looks a little like Mark and me, sort of. Credit for image is detailed in the image itself.....

jpo and mark

Saturday, 23 March 2013

Paper: Brain: Biomedical Knowledge Manipulation

There's a paper just out from Samuel, one of the PhD students in the group - the link to the Open Access paper is hereBrain is a Java software library facilitating the manipulation and creation of ontologies and knowledge bases represented with the Web Ontology Language (OWL).

%A S. Croset
%A J.P. Overington
%A D. Rebholz-Schuhman
%D 2013
%T Brain: Biomedical Knowledge Manipulation
%J Bioinformatics
%V 29
%O DOI:10.1093/bioinformatics/btt109
%O PMID:23505292

Friday, 22 March 2013

New Drug Approvals 2013 - Pt. IV - Ospemifene (OSPHENA®)

ATC Code: Not Assigned
Wikipedia: Ospemifene

On February 26, FDA approved Ospemifene (Trade Name: OSPHENAPubChemCID 3036505ChEMBLCHEMBL2105395, ChemSpider2300501) for the treatment of moderate to severe Dyspareunia - symptom of vulvular and vaginal atrophy due to menopause.

Dyspareunia, is pain during or after sexual intercourse. It can affect men, but is significantly more common in women, affecting up to one-fifth of women at some point in their lives. Women with dyspareunia may have pain in the vagina, clitoris or labia. This may be due to medical or psychological causes. There are numerous medical causes of Dyspareunia, like : Vaginismus, Pelvic Inflammatory Disease, Genital or Pelvic Tumors, Urethritis, Urinary Tract Infection, Vaginal Atrophy, Vaginal Dryness, Vulvar Cancer, Childbirth Trauma (postpartum), Skin Conditions (Lichen Sclerosus, Lichen Planus, Eczema, Psoriasis), Female Genital Mutilation, Endometriosis  - many of which can be treatable.

Ospemifene is a novel selective estrogen receptor modulator (SERM) - class of compounds that acts on Estrogen Receptors (ER's). SERM's has a distinguishing characteristic that makes them different from pure receptor agonists and antagonists, which is - that their mode of action is different in various tissues, thereby granting the possibility to selectively inhibit or stimulate estrogen-like action in various tissues (Pub-Med). Ospemifene is an Estrogen agonist/antagonist with tissue selective effects. Its biological actions are mediated through binding to Estrogen Receptors (Short Name: ER, ESR; UniProtQ92731 and P03372ChEMBLCHEMBL2093866). This binding results in activation of estrogenic pathways in some tissues (agonism) and blockade of estrogenic pathways in others (antagonism).

Mechanism of action of SERM's is of mixed agonism/antagonism which may differ depending on the chemical structure, but, for at least for some SERM's, it appears to be related to -
1. The ratio of co-activator to co-repressor proteins in different cell types.
2. The conformation of the estrogen receptor induced by drug binding, which in turn determines how strongly the drug/receptor complex recruits co-activators (resulting in an agonist response) relative to co-repressors (resulting in antagonism).

The protein sequences of human ER-alpha (ESR1) and ER-beta (ESR2) can be downloaded in fasta format from the link here. (courtesy UniProt)

Compound Name : Z-2-[4-(4-chloro-1,2-diphenylbut-1-enyl)phenoxy]ethanol
Canonical SMILES : OCCOc1ccc(cc1)\C(=C(\CCCl)/c2ccccc2)\c3ccccc3
Standard InChI : InChI=1S/C24H23ClO2/c25-16-15-23(19-7-3-1-4-8-19)24(20-9-5-2-6-10-20)21-11-13-22(14-12-21)27-18-17-26/h1-14,26H,15-18H2/b24-23-

Ospemifene, an ER agonist/antagonist, has a molecular weight of 378.9. The recommended dosage is 60 mg, available in the form of table for oral administration. After single dosage of Ospemifene under fasted conditions, mean Cmax and AUC (0 to infinity) were 533 ng/mL and 4165, respectively. With a high fat/high diet, mean Cmax and AUC (0 to infinity) were 1198 ng/mL and 7521, respectively. It is highly bound to serum proteins ( >99%) with apparent volume of distribution of 448 L. Ospemifene primarily undergoes metabolism via CYP3A4, CYP2C9 and CYP2C19 and the major metabolite was 4-hydroxyospemifene. Clearance of ospemifene was 9.16 L/hr and terminal half-life was 26 hrs. Following an oral administration of ospemifene, approximately 75% and 7% of the dose was excreted in feces and urine, respectively.

Osphena comes with a boxed warning in the form of Endometrial Cancer and Cardiovascular Disorders. Since Ospemifene is an ER agonist/antagonist with tissue selective effects. In endometrium, due to Ospemifene agonistic effects, there is an increased risk of endometrial cancer in a woman with a uterus who uses unopposed estrogens. Adding a progestin to estrogen therapy reduces the risk of endometrial hyperplasia, which may be a precursor to endometrial cancer. There is a reported increased risk of stroke and deep vein thrombosis in postmenopausal women who received daily oral conjugated estrogens alone therapy over 7.1 years. So, Ospemifene should be prescribed for the shortest duration consistent with treatment goals.

Full prescribing information can be found here.

The license holder is Shionogi Inc., and the product website is

Wednesday, 13 March 2013

The results are in - inorganics are out!

A few weeks ago we ran a small poll on how we should deal with inorganic molecules - not just simple sodium salts, but things like organoplatinums, and other compounds with dative bonds, unusual electronic states, etc. The results from you were clear, there was little interest in having a lot of our curation time spent on these. We will continue to collect structures from the source journals, and they will be in the full database, but we won't try and curate the structures, or display them in the interface. They will be appropriately flagged, and nothing will get lost. So there it is, democracy in action.

So for ChEMBL 16 expect fewer issues when you try and load our structures in your own pipelines and systems.

Thanks for all your input.


Registration for diXa course "Microarray Analysis using R and Bioconductor" is now open

We are partners in the diXa FP7 infrastructure grant for chemical safety 'omics data, and as part of this, there is a course aimed at people who could benefit from an introduction to microarray data analysis. This will take place at the EMBL-EBI from 14 -16 May 2013. No prior R or Bioconductor experience is required. Registration closes on 21st April 2013.

This course is aimed at researchers and scientists (PhD students, post-doc, staff scientist) who will benefit from an introduction to microarray data analysis and training in how to perform simple analyses using R/Bioconductor. All sessions are a combination of lectures and hands-on. Prerequisites are a life science degree or equivalent experience, basic understanding of microarray techniques, and a basic understanding of biostatistics. No prior knowledge of R or Bioconductor is assumed.

Participants will receive a basic understanding of the R syntax and ability to manipulate R objects. After this course students should feel comfortable with the R/Bioconductor environment and be in a position to continue their own explorations of the functionality of R and start using R for their basic biostatistics needs. You will understand why Quality Control of microarray is necessary, run a QC workflow and be able to correctly interpret the results. A range of data exploration methods will be reviewed (PCA, Hierarchical clustering, KNN and Kmean, Scatter plots).

For more information and a full programme:

The "Data infrastructure for chemical safety" (diXa) project aims to support the EU Toxicogenomics Research Community in developing non-animal assays in vitro/in silico for chemical safety, which better predict human toxicity in vivo. The diXa project will design a robust and openly accessible data infrastructure for capturing toxicogenomics data produced by past, current and future EU research projects.

As part of the project we will organise a range of training courses over the next 2 years; this is the first diXa training course open to the general scientific community. diXa training courses will focus on hands-on training using the consortiums unique combination of knowledge and expertise. 


USAN stem searching within ChEMBL

Here's a little tip that may be of some use to you. The compound name search feature will also match substrings, so it is easy to type in a USAN/INN stem, and then retrieve all matching compounds - for example, "gliptin" will retrieve all DPP-IV inhibitors with non-proprietary (formal) names. This is pretty useful - of course the substring search functionality is not restricted to USANs/INNs.

A list of the current USAN stems is here. Note, they were never designed to be orthogonal, so the low complexity ones will give a lot of false positives.....

Tuesday, 5 March 2013

Group Leader/Postdoc positions in selective kinase inhibitor design

One of our Industry Programme members passed on a listing for some positions they are looking to fill at the new center in Heidelberg - BioMed X. It looks like it may be of interest to many of the ChEMBL-og readers, so I thought I would post it here....

They are looking for a Computational Chemistry/Drug Design group leader for a project "Development of a design software of SELECTIVE kinase inhibitors". The BioMed X Innovation Center in Heidelberg, Germany, constitutes a new class of incubator at the interface between academia and industry where top life science talents from all over the world are jointly working on biomedical innovation outside the pharma box. Young talents from leading academic institutions world-wide are selected in annual assessment centers based on their scientific expertise, creative energy, and passion for product-oriented pre-clinical research & development. Interdisciplinary project teams are collaborating in an open-innovation lab facility in Heidelberg with guidance of experienced mentors from academia and industry while expanding their scientific network and receiving an intensive entrepreneurship and leadership training.

Application details are to be found in the link above.


Monday, 4 March 2013

To Remove Or Not To Remove - That Is The Question

During the course of standard compound curation, I come across problem inorganic compounds. An example of these are Cisplatin and Transplatin. These compounds only differ in the orientation of their complex bonds but complex bonds cannot be drawn in a standard molfile without causing InChI issues. At the  moment, they are kept separate by showing standard bonds between the Pt, Cl and NH3 in Cisplatin, but we have removed the bonds altogether for Transplatin. This is not an ideal situation, nor an accurate structural representation.

Another example is the compound, below left, and how it should look as a complex, right, from the paper:

At the moment, there are approximately 1,800 cases like this, which only accounts for 0.15% of the entire ChEMBL compound set.

What we are proposing to do is to remove the structures for these complex compounds and to keep only their names and all of the associated biological data. This would then treat them in a similar way to the antibodies and large peptides that we store in ChEMBL.

So, we have set up an online private Doodle Poll for you, our users, to have your vote on whether we should remove the structures and keep the biological data, or leave them as they are.

All comments are welcome.


Friday, 1 March 2013

New Malaria-Data release

We are very pleased to announce that a new release of the malaria-data resource (MMV_2) is now freely available here

The release was prepared on 1st March 2013 and contains:
  • 362,845 compound records
  • 280,985 compounds
  • 3,288,801 activities
  • 190,243 assays
  • 5,431 targets
  • 24,200 documents
The database contains several new datasets, including OSDD, Harvard and WHO-TDR Malaria screening data. Furthermore, the new interface has adopted the new look and feel features recently introduced  in the main ChEMBL interface, such as the redesigned search hits tables and document report card.

As usual, the interface provides compound, assay and target keyword search capabilities, as well as structure-based and sequence-based search functionality for compounds and protein targets respectively. Finally, structure look-ups are offered out-of-the-box via the UniChem cross-references. 

Please see MMV_2_release_notes.txt for full details of all changes in this release.

This is probably the most comprehensive public malaria data resource available but we aspire to broaden the coverage even more. If you or your academic group would like to deposit malaria screening data, please get in touch!

We greatly acknowledge the support and collaboration with the MMV.

George and Shaun