At the end of last year we mentioned that we are now using RDKit for our compound structure processing (see here ). Most excitingly, as a part of this we have been working with Greg Landrum the developer of RDKit over the last year to reimplement our curation pipeline using RDKit. The pipeline includes three functions: 1. Check Identifies and validates problem structures before they are added to the database 2. Standardize Standardises chemical structures according to a set of predefined ChEMBL business rules 3. GetParent Generates parent structures of multi-component compounds based on a set of rules and defined list of salts and solvents We are now pleased to announce that we are making all the code from this project freely available in GitHub . The functions can also now be used through our ChEMBL Beaker API. Live notebook with examples available here . For ChEMBL26 (shortly to be released) we have created new molfiles for all the ChEM
The Organization of Drug Discovery Data
| | | | | | | |