Skip to main content

Posts

ChEMBL webinar

We’ve now retired the old ChEMBL website but the new interface , with a range of improved features and functionality, is fully up and running! If you’re new to ChEMBL or eager to learn more, then register for our  webinar  on March 11th @ 15:30. Don’t forget, further information can also be found in our ChEMBL Quick Tour , ChEMBL-og and FAQs . Questions? Send us a message through the Helpdesk .

ChEMBL 26 Released

We are pleased to announce the release of ChEMBL_26 This version of the database, prepared on 10/01/2020 contains: 2,425,876 compound records 1,950,765 compounds (of which 1,940,733 have mol files) 15,996,368 activities 1,221,311 assays 13,377 targets 76,076 documents You can query the ChEMBL 26 data online via the ChEMBL Interface and you can also download the data from the ChEMBL FTP site . Please see ChEMBL_26 release notes for full details of all changes in this release. Changes since the last release: * Deposited Data Sets: CO-ADD antimicrobial screening data: Two new data sets have been included from the Community for Open Access Drug Discovery (CO-ADD). These data sets are screening of the NIH NCI Natural Product Set III in the CO-ADD assays (src_id = 40, Document ChEMBL_ID = CHEMBL4296183, DOI = 10.6019/CHEMBL4296183) and screening of the NIH NCI Diversity Set V in the CO-ADD assays (src_id = 40, Document ChEMBL_ID = CHEMBL4296182, DOI = 10.601...

ChEMBL Compound Curation Pipeline

At the end of last year we mentioned that we are now using RDKit for our compound structure processing (see here ). Most excitingly, as a part of this we have been working with Greg Landrum the developer of RDKit over the last year to reimplement our  curation pipeline using RDKit.  The pipeline includes three functions: 1. Check Identifies and validates problem structures before they are added to the database 2. Standardize Standardises chemical structures according to a set of predefined ChEMBL business rules  3. GetParent Generates parent structures of multi-component compounds based on a set of rules and defined list of salts and solvents We are now pleased to announce that we are making all the code from this project freely available in GitHub .  The functions can also now be used through our ChEMBL Beaker   API.  Live notebook with examples available here . For ChEMBL26 (shortly to be released) we have created ...

cbl_migrator is now open source!

cbl_migrator is the Python tool we developed to migrate the ChEMBL database from our primary Oracle instance to PosgreSQL, MySQL and SQLite. We first developed it to generate our dumps for the mentioned RDBMs but we also recently started to use it to populate our new PosgreSQL instances serving our API and web interface. It is build on top of the great SQLAlchemy library and its source cod is now available in our GitHub .

New ChEMBL ligand-based target predictions docker image available

One year ago we published a new version of our target prediction models and since then we've been working on its implementation for the upcoming ChEMBL 26 release. What did we do? First of all we re-trained the models with the LightGBM library instead of using scikit-learn. By doing this and tuning a bit the parameters our prediction timing improved by 2 orders of magnitude while keeping comparable prediction power. Having quicker models allowed us to easily implement a simple web service providing real time predictions. Since we are currently migration to a more sustainable Kubernetes infrastructure it made sense to us to directly write the small target prediction web service as a cloud native app. We then decided to give OpenFaaS a try as a platform to deploy machine learning models. OpenFaaS is a framework for building serverless functions with Docker and Kubernetes. It provides templates for deploying functions as REST endpoints in many different programming lang...

Merry Christmas and ChEMBL_26 coming soon!

The ChEMBL team will be heading off for Christmas soon, but just before we do, we wanted to share some updates... First, thanks to all of our many users and collaborators and we wish you all a happy holiday season and a productive 2020! Thanks also to everyone who helped us celebrate 10 years of ChEMBL at our symposium in October. For those who were unable to make it on the day, many of the talks and posters are available here . Over the last few months we've been busy working on ChEMBL_26, which we plan to release early next year. There will be some important changes in this release: We are now using RDKit  for almost all of our compound-related processing. For the first time in ChEMBL_26, this will include compound standardisation (look out for more info on this in the new year), salt-stripping, generation of canonical smiles, structural alerts, substructure searches and similarity searches (via FPSim2 ). Therefore, all molecules have been reprocessed and you may ...

Mechanism of Action and Drug Indication data on the interface.

Two new 'Browse' pages have been added to the interface;  Browse Drug Mechanisms  and  Browse Drug Indications . Users can now access these 2 pages directly to explore all the data. Or alternatively, they can land on these pages from drugs, compounds and targets in ChEMBL. Accessing all the data from the main page The 'circles' visualisation on the main page shows a summary of the entities in ChEMBL. Circles for Drug Mechanisms of Action and Drug Indications have been added. By clicking on the circles, you will be taken to a page that allows you to explore the corresponding entity.  Visualisation that summarises the entities in ChEMBL, Drug Mechanisms of Action and Drug Indications are now included. The Browse Drug Mechanisms and Browse Drug Indications pages allow you to use filters, link to other entities, and download the data in the same way as the other 'Browse' pages. All Drug Mechanism data. All Drug Indication data. Acce...