ChEMBL Resources


Wednesday, 28 March 2012

Target Discovery Institute, Oxford, UK.

I came across a link to the new Oxford Target Discovery Institute today the website is and it contains the BHF Centre for Cardiovascular Target Discovery.  It looks a great new facility, and a significant strengthening of the UK academic armamentarium for drug discovery!

Structure-Based DrugEBIlity Webinar 4th April

This is a call for people wanting to sign up for the "Structure-Based DrugEBIlity" webinar that will be hosted next Wednesday 4th April at 3.30pm (GMT). It will be a 45 minute webinar where you will be taken through our DrugEBIlity interface. The DrugEBIlity interface is a structure-based druggability search engine where users can survey different types of druggability scores of a given protein structure. Remember to register your interest in our webinars on the Doodle Poll. Make sure that you leave your **email address** as well as your name so that we can send the connection details to you. Any problems, please contact

Blogging from the ACS

I'm at the ACS in San Diego this week, there are three of the ChEMBLites here - two talks down, one to go. It's been a really great meeting, really excellent. I've even managed to sort of stay on UK time, so waking up at about 2am, and then having a good session on the computer before talks start at 8am. My only moan has been that on the computational side, there are too many interesting parallel sessions, and it's difficult to choose where to go. Anyway, I've spent some time with old and new friends, and feel really upbeat about the way that chemoinformatics is impacting our understanding of biology, and how progress is being made in how to design compounds that modulate biological systems. I sense that the availability of large-scale data and large-scale computing are really feeding off each other and allowing things to be developed that could only be imagined a few years ago.

And in a great advance for mankind, internet is free at the conference; seriously well done ACS for sorting this out - I'm fed up of paying serious cash for internet access at conferences.

Blogging really seems to have taken off this year - or at least I'm starting to track live blogging more than I did - Rajarshi Guha of NCTT has been a superstar - here's his twitter feed. A must follow account for those in the field.

Also great has been Carmen Drahl of the ACS itself - here's her twitter feed. Of particular note is the live-blogging she did on first time med. chem. disclosures. Great service - here's the link to the CEN hosted blog. I think there is a great opportunity here to help the world - crowd sourcing and immediately distributing key facts, Carmen has naturally focussed on the chemical structure aspects; but imagine if there were more like-minded people who captured this sort of really valuable data and tweeted or blogged as a community, in a way that others not able to be there could react to the data, analyse it and integrate in their research. If done right, and more importantly could be integrated into a living public and fully Open resource. If the world was even more perfect, the presenters would immediately post their slides online, with semantically marked up assays and InChIs of all the compounds.... hey, you can tell I'm in California!

The ACS tattoo above is pretty cool; maybe if you ask me nicely when we meet, I can show you mine.

Tuesday, 27 March 2012

Update on EMBO Chemical Biology 2012

The 2012 EMBO Chemical Biology meeting to be held in Heidelberg is looking very good, we had an excellent set of speakers, and now this has been further strengthened with the presence of George Whitesides.  Links to the conference details are here, registration is now open. We are coordinating some arrangements with the MIPTEC conference in Basel this year, and will set up a daily rate to allow attendees at MIPTEC to participate in the drug discovery sessions at the EMBO meeting. Look in a few days on the conference website for more details of this.

Of course, several of the ChEMBL group will be there, and so if you'd like to meet any of us there, hear about our plans for the databases, or know more about research, drop us a line.

Friday, 23 March 2012

Conference: Cutting Edge Approaches to Drug Design 2012

The "Cutting Edge Approaches to Drug Design" (CEADD) Symposia, originally set up by the RSC Molecular Modelling Group and now run by the Molecular Graphics and Modelling Society (MGMS), are a well-established event in the scientific calendar. They are aimed primarily at people with a medicinal chemistry background and should also be of interest to those involved in computational biology, computational chemistry, bioinformatics, cheminformatics, biophysics and structural biology. The emphasis is on interdisciplinarity in drug discovery and also on evolving tools and techniques and their application in understanding biological systems.

Further details are here.

One of the best conferences for modelling in the UK I think!

Thursday, 15 March 2012

ChEMBL Schema & SQL Querying Webinar

This is a call for people wanting to sign up for the "Schema & SQL Querying" webinar that will be hosted next Wednesday 21st March at 3.30pm (GMT).
It will be a 45 minute webinar that will take you through the ChEMBL schema and also how to use SQL queries to extract data from the database.
Remember to register your interest in our webinars on the Doodle Poll. Make sure that you leave your email address as well as your name so that we can send the connection details to you. Any problems, please contact
For those of you who can't make it to this webinar, we will be hosting it again on the 16th of May.

How much does google analytics under-report things?

I was just comparing the built in page view stats tools in google's blogger software and the stats in google analytics. The former is server side, but they do prune access from spam site (I think), the latter is relies on interactions with the client, cookies, etc. so in the toy way I understand teh interweb, I see this as 'client' side. It's really simple to configure things so that google analytics doesn't track access, and quite a few people do.

So here's an interesting number - from July 1st 2011 to today, there were 184,035 page views (~710 per day) for this blog in google bloggers stats for, and only 55,435 page views in google analytics (~214 per day) - don't laugh at how small the numbers are, but now you know. Anyway, google analytics is about 3.3 fold down on actual page views.

I'm sure all bloggers look at their stats, so is this ratio typical?

ChEMBL in rdf form using TopBraid.

There's an interesting blog post that I was directed to recently, and it may be of interest to a broader audience. It's here on David Price's blog and details the loading of an early version of ChEMBL into OWL using D2R under TopBraid Composer. Shame there's no updates since the original post...

Monday, 12 March 2012

Tender: Consultancy services for OPS licensing and IP issues for Open PHACTS IMI Project

Open PHACTS is a 3-year EU-funded (IMI) project, targeted to enhance and accelerate data intensive drug research for academic and industry partners. It comprises the development of an innovative open source, open standard and open access platform (application), the Open Pharmacological Space (OPS). The project is driven by the Open PHACTS consortium, composed of 14 European core academic and SME partners in close cooperation with 8 major industry partners from pharmacological areas.

The realization of the OPS platform and its placement in the targeted pharmaceutical area significantly depends on a proper strategic licensing plan considering all licensing and IP issues of the incorporated sources (data, software components).

Main purpose and primary role of required consultancy:
  • The primary role of the consultant is to contribute in depth knowledge with respect to licensing models and IP rights to the Open PHACTS project. Thus, consultancy services are targeted to ensure compliance of the OPS platform with licensing conditions related to the data and software components held in it.
  • To develop a high-level strategic plan for licensing, considering the current and anticipated data sources and software components as well as OPS business case requirements.
  • In depth assessment of licencing status of each individual data source, further providing a recommendation whether or how this is acceptable for inclusion into the open PHACTS platform.
  • Engagement in communication of licence model options with partners.
  • To work with data owners to develop alternative licence models (e.g. such as Creative Commons model) where the original one does not fit and the provider is willing to participate.
  • To produce internal and public policy documents regarding the Open PHACTS licencing compliance for the data sources it contains.
  • To represent Open PHACTS in public forum to promote the ability to consume public data for publishing in the platform.
Further details can be found here, deadline for applications is 16th April 2012.

The 22nd Jyväskylä Summer School

We're involved in teaching at the 22nd Jyväskylä Summer School, in Jyväskylä in the fabulous country of Finland. Details of the contents and schedule for the Drug Discovery course held from the 20th to 24th August 2012 can be found here.

Friday, 9 March 2012

New Drug Approvals 2012 - Pt. VII - Lucinactant (SurfaxinTM)

ATC code: R07AA30

On March 6, the FDA approved Lucinactant (previously known as KL4-surfactant and ATI 02) for the prevention of infant respiratory distress syndrome (IRDS), which occurs in premature infants with an incidence of 1%. The onset of IRDS is shortly after birth and it typically lasts 2-3 days. Symptoms include shortness of breath, increased heart rate and bluish discoloration of the skin (cyanosis). IRDS can lead to serious complications such as chronic changes of the lung structure, acidosis, intracranial hemorrhage and an incomplete closure of the vascular connection between the aorta and the pulmonary artery (patent ductus arteriosus). In developed countries, IRDS is one of the leading causes of death in the first month after birth.

IRDS is caused by insufficient production of surfactant, a substance that is secreted into the air-filled alveoli of the lung by specialized cells called type II pneumocytes. The lack of surfactant causes an increased surface tension on the interface between the capillary blood vessels (and embedding alveolar tissue) and the air-filled lumen of the alveolus. This results in a contraction of the air-space and obstructs normal breathing.

Lucinactant is a substitute for endogenuous surfactant that is administered via a intratracheal tube. Unlike other formulations on the market such as beractant, poractant and calfactant - which are animal derived - Lucinactant is a synthetic formulation consisting of a mixture of phospholipids, fatty acids and a synthetic peptide called sinapultide. Sinapultide is a hydrophobic peptide composed of 17 leucine residues and 5 lysine residues. The peptide is designed to mimick the properties of apolipoprotein SP-B (Uniprot P07988). The remaining components of Lucinactant are palmitic acid and the phospholipids DPPC and POPG.

palmitic acid



Palmitic acid (CHEMBL82293) is a fatty acid with molecular weight 256.42 Da.
IUPAC name: Hexadecanoic acid

DPPC  (CHEMBL1200737) is a phospholipid with molecular weight 734.06Da.
IUPAC name: 1,2-dipalmitoyl-sn-glycero-3-phosphocholine

DOPG is a phospholipid with molecular weight 747.50 Da.
IUPAC name: 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphoglycerol

Sinapultide is a synthetic peptide of 21 amino acids with molecular weight 2469.46 Da.
CAS: 138531-07-4

It is of note that Lucinactant is a USAN which is a mixture of four distinct components (without a defined composition in the USAN document). This creates a surprising number of issues in the storage and retrieval of drug information (sigh), and this sort of thing is dragging us towards defining a ChEMBL USAN-like data object for use in our systems (double sigh). Sinapultide has its own distinct assigned USAN, but DPPC, Palmitic acid and DOPG do not (or not that I can find).

Lucinactent is an off-white gel at the recommended storage temperature of 2-8 C  but becomes a liquid when warmed before use. Each mL of SURFAXIN contains 22.50 mg DPPC and 7.50 mg POPG, Na, 4.05 mg palmitic acid, and 0.862 mg sinapultide in tromethamine and sodium chloride. It is recommended that a maximum of 4 doses at 5.8 mL/kg is administered within 48 h after onset of IRDS, within intervals of at least 6h.

Pharmakokinetics of Lucinactent were not studied in humans. A study into the treatment of adult respiratory distress syndrome (ARDS) resulted in increased mortality rate of treated patients. Lucinactent is not indicated for the treatment of ARDS.

Lucinactent is marketed by Discovery Labs Inc. under the name Surfaxin. Full prescribing information can be found here.

Wednesday, 7 March 2012

ChEMBL Webinars

For those of you who want to sign up to the ChEMBL webinars that are planned for the coming months, we have now set up a Doodle Poll that you can use to register your interest. Please note that the Doodle Poll is hidden, so only the ChEMBL Team can see who has signed up. Make sure that you leave both your **name** and **email address** in the 'Your Name' field so that someone from ChEMBL Help can get back to you with the connection details.

Meeting: ECBS2012 - 3rd European Chemical Biology Symposium

There's a great meeting in central Europe this summer, from the 1st to 3rd July 2012 - the 3rd European Chemical Biology Symposium/ 2nd Vienna Drug Action Conference, held at the Festive Hall of the Austrian Academy of Sciences, Vienna, Austria.

Tuesday, 6 March 2012

Webinar Reminder

This is a quick reminder for all ChEMBL users that the first in our new series of webinars will be starting tomorrow at 15:30 (UK local time). Tomorrows topic is Interface and Searching. There is still time to register for this webinar and all future ones, if you email us at chembl-help.

Saturday, 3 March 2012

A Dating Site For Chemists and Biologists

Probably everyone who reads the ChEMBL-og will have world-changing ideas - but it's really difficult to find someone to screen a few compounds for you - of course there are CROs who will want to meet, then prepare a quote for you, set up a CDA, receive payment, etc., but cash is difficult to get hold of, and the process will be slow. There are no grant mechanisms for this sort of thing either - imagine - "I'd like funds to test four compounds as potential inhibitors of snoraze" - no chance (at least with the panels I've sat on) too small, too speculative.... The bigger problem though is finding someone with the assay or the compounds.

But, there's a lot of people with compounds to test, and a lot of biologists with assays that are easy to run in their labs, and they have expertise in, but who can't assemble sets of interesting compounds to profile. Why not just use the paradigm of a dating site to matchmake mutually compatible biologists and chemists - if there is a spark, it could develop into a long lasting (collaborative) relationship!

Imagine something like:

Biologist with HMGCoA reductase assay and expertise in cholesterol homeostasis would like to meet chemist with non-statin compounds likely to be brain penetrant to test a cool idea.

Anyway, there's a toy FaceBook group that I've set up - just to get the idea across. I've pitched this as a national thing (so for me that means to the UK, for you somewhere different maybe) - not least that it's a lot easier to ship compounds around within a country than between - and also there's a clear match to downstream funding opportunities. I chose FaceBook, since most of the open LinkedIn groups I'm involved in are train-wrecks of spam and flame-wars.

I think this idea is worth trying, or at least getting some discussion started over - huge thanks to Tom Heightman for our recent discussion on things that needed to be done in Chemical Biology in the UK.

Maybe Google+ is another alternative.

Friday, 2 March 2012

Pfam domain searching of targets in ChEMBL

One thing new in the backend and interface for this release of ChEMBL is the ability to search for targets containing particular PFAM domains. So if you know a PFAM id, you can search in the search box (and then select "Targets" for that domain. For example, PF00001 is the Pfam ID for the rhodopsin-like GPCRs.

A couple of important things on this though - the current functionality does exactly what it says - it returns proteins that contain that domain - the compounds do not necessarily (and often in fact do not) bind at that domain. This multidomain, and multi protein target issue is a surprisingly big challenge, and is a big trap for the unwary. So caveat emptor.

We do plan in the next release or two, provide a prediction of the likely/known compound binding domain (however here, for proteins that contain multiple copies of the predicted/lknown binding domain it is complicated....).