Skip to main content


Showing posts from 2019

Merry Christmas and ChEMBL_26 coming soon!

The ChEMBL team will be heading off for Christmas soon, but just before we do, we wanted to share some updates... First, thanks to all of our many users and collaborators and we wish you all a happy holiday season and a productive 2020! Thanks also to everyone who helped us celebrate 10 years of ChEMBL at our symposium in October. For those who were unable to make it on the day, many of the talks and posters are available here . Over the last few months we've been busy working on ChEMBL_26, which we plan to release early next year. There will be some important changes in this release: We are now using RDKit  for almost all of our compound-related processing. For the first time in ChEMBL_26, this will include compound standardisation (look out for more info on this in the new year), salt-stripping, generation of canonical smiles, structural alerts, substructure searches and similarity searches (via FPSim2 ). Therefore, all molecules have been reprocessed and you may notic

Mechanism of Action and Drug Indication data on the interface.

Two new 'Browse' pages have been added to the interface;  Browse Drug Mechanisms  and  Browse Drug Indications . Users can now access these 2 pages directly to explore all the data. Or alternatively, they can land on these pages from drugs, compounds and targets in ChEMBL. Accessing all the data from the main page The 'circles' visualisation on the main page shows a summary of the entities in ChEMBL. Circles for Drug Mechanisms of Action and Drug Indications have been added. By clicking on the circles, you will be taken to a page that allows you to explore the corresponding entity.  Visualisation that summarises the entities in ChEMBL, Drug Mechanisms of Action and Drug Indications are now included. The Browse Drug Mechanisms and Browse Drug Indications pages allow you to use filters, link to other entities, and download the data in the same way as the other 'Browse' pages. All Drug Mechanism data. All Drug Indication data. Acce

New text filter on the ChEMBL interface

A new text filter has been added to the  search results  and the ' Browse ' pages of the interface. This filter is shown as a small search bar at the top-right of tables and card pages. It can be used as a simple and fast way to filter a set of items. The filter appends a new query to the current query to match the term entered with all the available fields that are non-numeric. It is based on the Querystring query of Elasticsearch , so wildcards can be used in the search box. To see an example of how it works, you can follow these steps: Go to the  Browse Drugs  page: Use the filters to the left to select only Phase 4 drugs with no Rule of Five violations:  Enter the term '*antibacterial*' on the search box and click on the search button: It will match the term on the following fields: Parent Molecule ChEMBL ID, Synonyms, Research Codes, Applicants, USAN Stem, ATC Codes, USAN Stem Defini

CuPy example for CUDA based similarity search in Python

CuPy is a really nice library developed by a Japanese startup and supported by NVIDIA that allows to easily run CUDA code in Python using NumPy arrays as input. It also provides interoperability with Numba (just-in-time Python compiler) and DLPackAt (tensor specification used in PyTorch, the deep learning library). CUDA is a parallel computing platform and application programming interface that allows using GPUs for general purpose, not only graphics related computing. Just to give an idea of the level of parallelization it can be achieved with it, a not very expensive consumer's GPU like the NVIDIA GTX 1080 comes with 2560 CUDA cores. Because at ChEMBL we love anything that makes Python fast and that is well integrated with NumPy we couldn't resist to give it a try! Let's go through a example to see how it is working... Google colab notebook . Colab provides the option to run notebooks in GPU and CuPy is already installed on the default Python environment :)

Multi-task neural network on ChEMBL with PyTorch 1.0 and RDKit

  The use and application of multi-task neural networks is growing rapidly in cheminformatics and drug discovery. Examples can be found in the following publications: - Deep Learning as an Opportunity in VirtualScreening - Massively Multitask Networks for Drug Discovery - Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set But what is a multi-task neural network? In short, it's a kind of neural network architecture that can optimise multiple classification/regression problems at the same time while taking advantage of their shared description. This blogpost gives a great overview of their architecture. All networks in references above implement the hard parameter sharing approach. So, having a set of activities relating targets and molecules we can train a single neural network as a binary multi-label classifier that will output the probability of activity/inactivity for each of the targets (tasks) for a given q

Job opportunities in the ChEMBL Group

We have two exciting opportunities for scientists to come and work with the ChEMBL team at the Wellcome Genome Campus in Hinxton near Cambridge. If you've used ChEMBL in the past perhaps now is the chance to come and shape its future.  Even if you haven't this is a great place to work and in both positions you will collaborate with people developing the ChEMBL resources but also our collaborators here at Hinxton and around Europe.  These include the Open Targets project and EU funded toxicology projects such as EU-ToxRisk and eTRANSAFE . We are looking for: (1) A talented chemoinformatician  to work on methods for the annotation, searching and visualization of toxicologically relevant data. You will develop pipelines and tools to enable the better prediction and assessment of the toxicity of pharmaceutical and environmental chemicals. Closing Date 19th May 2019 More details here (2) A protein computational scientist to  develop, assess and validate methods for

ChEMBL 25 and new web interface released

We are pleased to announce the release of ChEMBL 25 and our new web interface. This version of the database, prepared on 10/12/2018 contains: 2,335,417 compound records 1,879,206 compounds (of which 1,870,461 have mol files) 15,504,603 activities 1,125,387 assays 12,482 targets 72,271 documents Data can be downloaded from the ChEMBL ftp site: Please see ChEMBL_25 release notes for full details of all changes in this release: DATA CHANGES SINCE THE LAST RELEASE # Deposited Data Sets: Kuster Lab Chemical Proteomics Drug Profiling (src_id = 48, Document ChEMBL_ID = CHEMBL3991601): Data have been included from the publication: The target landscape of clinical kinase drugs. Klaeger S, Heinzlmeir S and Wilhelm M et al (2017), Science, 358-6367 ( ) # In Vivo Assay

ChEMBL is 10 years old in 2019!

In 2019 we celebrate the 10th anniversary of the first public release of the ChEMBL database. To recognise this important landmark we are organising a one-day symposium to celebrate the work achieved by ChEMBL during its first ten years, and look forward to its future. Save the date - Tuesday 8th October 2019 The symposium will be held on Tuesday 8th October in the Francis Crick Auditorium on the Wellcome Genome Campus, Hinxton, Cambridge, UK. A series of talks from invited speakers will be followed by a celebratory birthday cake and drinks reception. During the breaks, the poster session will be a great opportunity to catch up with other users of the ChEMBL database and chat to colleagues, co-workers and others to find how more about how the database is being used. For the programme of invited talks, and  more information on how to register,  see

Target prediction, QSAR and conformal prediction

  You know that in the ChEMBL group, we love to play with the data we collect!! Back in April 2014, we started to work on a  target prediction tool.   Wow! This was almost 5 years ago! Since then, we have continued to update the tool for each new ChEMBL release, providing you with the actual models and the result of the prediction on the ChEMBL website for the drug molecules. The good news is that these target predictions are not dead and a successor is on its way! First, we would like to introduce you some closely related work. You may have heard about conformal prediction (CP). If not, it is a machine learning framework developed to associate confidence to predictions. I personally consider this as a requirement for decision making. Basically, you train a model as you would do in QSAR but then you first predict a so-called calibration set, for which you know the actual values. For each of these observations you obtain two probabilities: one for the active and one for the