ChEMBL Resources


Tuesday, 31 December 2013

New Year, New Job? Research Associate in Epidemiology at UCL (ChEMBL related)

As part of a collaboration the ChEMBL group are involved in with UCL, we are looking to appoint to a full time, three-year position in the Genetic Epidemiology Group, Institute of Cardiovascular Science, one of the component institutes of the UCL Faculty of Population Health Sciences.

The appointee will join an exciting programme of work funded by the UCL National Institute of Heath Research Biomedical Research Centre, through its High Impact Award scheme. The appointee will apply bioinformatic expertise to the late stage development of a new high density genotyping array designed to support drug target validation and related drug development issues; co-ordinate deployment of the array in a large consortium of highly-phenotyped cohort studies (the University College-London School of Hygiene-Edinburgh-Bristol consortium); undertake statistical analysis of the data; and play a leading role in writing manuscripts reporting findings arising from this work.

The post is based at UCL, but the work will involve collaboration with colleagues in Edinburgh, Bristol, the London School of Hygiene and Tropical Medicine, and the European Bioinformatics Institute at the Wellcome Trust Genome Campus in Hinxton, Cambridgeshire.

Key Requirements - The post would suit someone whose background is in genetic epidemiology, statistical genetics, or bioinformatics, particularly those whose expertise extends across two or more of these areas. We expect the appointee to have a PhD or equivalent degree. We are willing to consider applications from exceptional individuals who have only recently obtained their PhD degree, provided they can demonstrate the requisite skill. We place a strong emphasis on potential for independence and on supporting career development.

We particularly welcome female applicants and those from an ethnic minority, as they are under-represented within UCL at this level.
More details and how to apply are here.

Closing Date - 12 Jan 2014


Friday, 20 December 2013

Conference: CEADD2014 - Modelling water in biological systems, London, March 2014

A one day conference entitled 'Modelling Water in Biological Systems' will be held at the School of Oriental and African Studies (SOAS) in London on Friday, 28 March, 2014. This meeting, organised by the MGMS, is the latest in the 'Cutting Edge Approaches to Drug Design (CEADD)' series.

In recent years, significant progress has been made in probing the role of water molecules in protein-ligand binding.  Hydration is a crucial factor in understanding binding modes, ligand affinities and kinetics.  Modelling tools are becoming available which may offer new insights in this exciting and evolving area of current research. This conference provides a timely overview of some of the main research avenues in this important field.

Thursday, 19 December 2013

ChEMBL Web Service Update 4: A Reminder

This post is to remind users of the ChEMBL Web Services that we will soon be changing the backend to use the new ChEMBL API. Since our initial announcement about the changes, which you can read about here, here and here, we have made some more changes and optimisations, which speed up the services significantly.

We thank everyone for feedback to date and urge anyone else who makes use of the ChEMBL Web Services to test the new version. Remember they are simple to test, just use the following temporary base URL and everything should work as if you are using the current live Web Services:

We would like to make the change in January, so please get in touch if you have any questions or experience any problems.

Once we have made the technology switch and happy that it is working in the wild as expected, we will be doing a complete review of the functionality offered by the current ChEMBL Web Services. So expected some big changes in 2014.

Tuesday, 17 December 2013

UniChem: A resource for compound mapping - use in BioMedBridges

Unichem is a simple database and web service for the InChI-based linkage of chemical structures across various resources. It was initially developed under the EU-OPENSCREEN ESFRI as an approach to link screening data from the planned screening collection to other chemistry resources. The development was then extended under the BioMedBridges project - which spans across various Biomedical Sciences (BMS) ESFRIs (such as ELIXIR, BBMRI, etc.)

It's proved to be remarkably useful to us as well, and will be the future home of regularly updated feeds of compound structures from SureChEMBL - and will allow rapid novelty checking of patent structure novelty, across component datasources. A side-effect of this, is of course, that immediately the compounds in any of the BioMedBridge partner ESFRIs immediately have patent data integration. For us, this synergy, and snowball effect of binding resources together using simple open standards is one of the great joys of our work!

Follow @SureChEMBL for ongoing updates on status of the SureChEMBL resource.

Friday, 13 December 2013

Notes from Rita's Talk Yesterday.

Rita gave a talk on her recent drug target work yesterday on campus, and Jenny Cham took notes; aren't they great?


Wednesday, 11 December 2013

SureChEMBL - Chemical Structure Information in Patents

Today we have announced that we are taking over the running of the SureChem system from Digital Science. We have renamed this SureChEMBL to reflect the history and provenance of the technology and engineering, but also to align it with it's new home and future, we like the name, and hope you do. We are delighted that this has happened - Nicko and the team at Digital Science have been great, and the more we have dug in to how it works, the more we have appreciated the design and vision that they had.

If there is one consistent piece of feedback we get about ChEMBL it is in encouraging us to add patent data to what we do. So now we have, but because the data from patents is different in detail from that reported in the published literature, we will keep the databases separate, but closely integrated.

For those of you that are already SureChem users you will be familiar with the functionality and how it works; but for those that weren't SureChEMBL takes feeds of full text patents, identifies chemical objects from either the in-line text or from images and adds 2-D chemical structures. This is then loaded into a database and is searchable by chemical structure, so you can do substructure, similarity searching and so forth - all the good things you'd expect from a chemical database. This chemical search functionality is unavailable from the public, published patent documents, and is really essential for anyone seriously using the patent literature. Oh, and the system does this live, so as patents are published, they are processed and added to the system - the delay between publication and structures being available in SureChEMBL is about a day when converted from text, and a few days when converted from image sources.

SureChEMBL is hosted on the cloud - it's quite a complicated AWS solution, and it will take a few months for us to assume complete control of all the various parts, and, importantly keep things running smoothly behind the scenes, so the continuous access to fresh patent data is maintained.

SureChEMBL uses a number of third part software products in its operation, and arranging the licenses and permissions has been complex, and is still ongoing. The 3rd party software and data feeds used in SureChEMBL include:

Name to structureChemAxon, ACD/Labs, Perkin Elmer, OpenEye, OPSIN, NextMove
Chemical cartridge: ChemAxon
Image to structureKey Module
Patent data: FairView (IFI Claims) – processed patents, TwinDolphin – patent PDFs

These guys have all been a pleasure to work with so far, and SureChEMBL is a great showcase of their respective technologies and data:

We will host the system at the primary urls and also at - at the moment , these redirect to, but as we switch things over they will point to servers provisioned by our team, so please start using these new urls for future access, although the original urls will continue to work into the future.

One of the more complicated things to transfer is the user accounts system - we can't simply transfer them over - and so have a plan to mail batches of users once a new sign-on system is in place in order to invite them to sign up to the new user account system. If you are not currently a registered user, please sign up with the current system, and we'll invite you to transfer over to our sign-on system once things are ready.

The EMBL-EBI has a broad range of life-science chemistry resources, and we integrate across chemistry related content using a chemical structure integration system call UniChem. In overview the EMBL-EBI chemistry resources include the following.

The future? - well the future is exciting, and we have lots of ideas to actively develop the SureChEMBL system. To be clear though, doing this will rely on us getting funding, and we're working hard on this. Some of the ideas we have for SureChEMBL include:
  • Put SureChEMBL chemical content into UniChem
  • Add sequence searching
  • Add disease term, animal model, etc. indexing
  • Development of community KNIME nodes
  • Add links to/from Europe PMC
  • Ligand Ensemble-based mapping of ChEMBL literature to patents
  • Refactor interface for EMBL look and feel
  • Extend image extraction retrospectively from 2006 using spot priced compute from AWS
  • Provide weekly/monthly feed of patent structures to PubChem
  • Add chemical structure tagging & search to full text content of Europe PMC
But one of the first things we plan to do is index genes and targets (in collaboration with local SME SciBite) and provide an RDF form of the data and REST web services as part of the IMI OpenPHACTS project.

In the new year, we will run a webinar on SureChEMBL (which we will announce here), but in the mean-time we're very happy to take questions on the SureChEMBL support email address surechembl-help (at)