Skip to main content

Posts

Showing posts from February, 2020

ChEMBL Compound Curation Pipeline

At the end of last year we mentioned that we are now using RDKit for our compound structure processing (see here ). Most excitingly, as a part of this we have been working with Greg Landrum the developer of RDKit over the last year to reimplement our  curation pipeline using RDKit.  The pipeline includes three functions: 1. Check Identifies and validates problem structures before they are added to the database 2. Standardize Standardises chemical structures according to a set of predefined ChEMBL business rules  3. GetParent Generates parent structures of multi-component compounds based on a set of rules and defined list of salts and solvents We are now pleased to announce that we are making all the code from this project freely available in GitHub .  The functions can also now be used through our ChEMBL Beaker   API.  Live notebook with examples available here . For ChEMBL26 (shortly to be released) we have created new molfiles for all the ChEM

cbl_migrator is now open source!

cbl_migrator is the Python tool we developed to migrate the ChEMBL database from our primary Oracle instance to PosgreSQL, MySQL and SQLite. We first developed it to generate our dumps for the mentioned RDBMs but we also recently started to use it to populate our new PosgreSQL instances serving our API and web interface. It is build on top of the great SQLAlchemy library and its source cod is now available in our GitHub .