Skip to main content

Posts

ChEMBL 22 Released

We are pleased to announce the release of ChEMBL 22. This version of the database, prepared on 8th August 2016 contains: 2,043,051 compound records 1,686,695 compounds (of which 1,678,393 have mol files) 14,371,219 activities 1,246,132 assays 11,224 targets 65,213 documents Data can be downloaded from the ChEMBL ftpsite or viewed via the ChEMBL interface . Please see ChEMBL_22 release notes for full details of all changes in this release. CHANGES SINCE THE LAST RELEASE In addition to the regular updates to the Scientific Literature, PubChem, FDA Orange Book and USP Dictionary of USAN and INN Investigational Drug Names this release of ChEMBL also includes the following new data: Deposited Data Sets: Two new deposited data sets have been included in ChEMBL_22: the MMV Pathogen Box compound set ( http://www.pathogenbox.org ) and GSK Tres Cantos Follow-up TB Screening Data ( http://dx.doi.org/10.1371/journal.pone.0142293 ). Patent Data from BindingDB: We have wo...

ChEMBL_22 is coming soon....

ChEMBL_22 will be released in the next week or two. For those of you who want to plan ahead, here is a preview of the new schema (full documentation here ) We would also like to inform users that we plan to discontinue the Oracle 9i download format after this release. Please  contact us  as soon as possible if you rely on this version.

Join the EMBL-EBI Chemogenomics team!

We are currently seeking multiple talented individuals to join the Chemogenomics team here at EMBL-EBI, both to work on our group resources (ChEMBL, SureChEMBL) and support external projects (FP7 HeCaToS and NIH Illuminating the Druggable Genome). If you are interested in applying for these positions (or for more information) please follow the links below. The closing date for all positions is 12th June. Java Back End Developer: https://ig14.i-grasp.com/fe/tpl_embl01.asp?newms=jj&id=54993&aid=15470 Web Developer: https://ig14.i-grasp.com/fe/tpl_embl01.asp?newms=jj&id=54992&aid=15470 Scientific Programmer: https://ig14.i-grasp.com/fe/tpl_embl01.asp?newms=jj&id=54991&aid=15470 Data Mining and Analysis Scientist: https://ig14.i-grasp.com/fe/tpl_embl01.asp?newms=jj&id=54990&aid=15470 Biological Data Curator: https://ig14.i-grasp.com/fe/tpl_embl01.asp?newms=jj&id=54985&aid=15470

Target Prediction Models Update

In case you have been too busy to notice, ChEMBL_21 has arrived with the usual additions, improvements and enhancements both on the data/annotation side, as well as on the interface/services. To complement this, we have also updated the target prediction models , which can be downloaded from our ftp  here .  The good news is that, besides the increase in terms of training data (compounds and targets), the new models were built using the latest stable versions of RDKit ( 2015.09.2) and scikit-learn (0.17). The latter was upgraded from the much older 0.14 version, which was causing incompatibility issues (see  MultiLabelBinarizer ) to several of you while trying to use the models. We've also put together a quick Jupyter Notebook demo on how to get predictions from the models here:  https://github.com/madgpap/notebooks/blob/master/target_pred_21_demo.ipynb The new models will also be available on myChEMBL 21 along with a more detai...

This Python InChI Key resolver will blow your mind

This scientific clickbait title introduces our promised blog post about the integration of UniChem into our ChEMBL python client. UniChem is a very important resource, as it contains information about 134 million (and counting) unique compound structures and cross references between various chemistry resources. Since UniChem is developed in-house and provides its own web services , we thought it would make sense to integrate it with our python client library . Before we present a systematic translation between raw HTTP calls described in the UniChem API documentation and client calls, let us provide some preliminary information: In order to install the client, you should use pip : pip install -U chembl_webresource_client Once you have it installed, you can import the unichem module: from chembl_webresource_client.unichem import unichem_client as unichem OK, so how to resolve an InChI Key to InChI string? It's very simple: Of course in order to reso...

ChEMBL 21 web services update

Traditionally, along with the release of the new ChEMBL version , we have made a few updates to our RESTful API . Below you can find a short description of the most important changes:   Data API ( https://www.ebi.ac.uk/chembl/api/data/docs ): 1. New resources: Since ChEMBL 21 introduced a few new tables, we have made them available via the API. The new resources are: drug_indication go_slim metabolism Moreover, the target_component endpoint has been enhanced to provide a list of related GO terms. 2. Solr-based search : a very popular feature request was the ability to search resources by a keyword. A form of searching was already possible before, using filtering terms, such as [i]contains,[i]startswith and [i]endswtith filters. For example, in order to search molecules for 'metazide' in their preferred name, this filter can be used: api/data/molecule?pref_name__icontains=metazide However, this approach has many drawbacks: it's executed on th...

ChEMBL DB on SQLite, is that even possible?

Short answer: Yes; Andrew Dalke did it in 2014 for ChEMBL 19 compounds but now it's officially supported by the ChEMBL team and covers the whole database. One thing you can notice looking at the ChEMBL 21 FTP directory is a  new file called chembl_21_sqlite.tar.gz . What's that? It's a binary SQLite database file containing all the ChEMBL 21 tables and data. If you don't know what the SQLite is, it's a very lightweight database system, that stores the entire database (definitions, tables, indices, and the data itself) as a single cross-platform file on a host machine. It's very popular as well, so if you have a Mac, Windows 10 or a Linux box, chances are that SQLite is already installed on your computer. Skype uses SQLite to store the local copy of conversation history and the Python language has SQLite bundled as a core library . If it's so "lightweight", why is the SQLite ChEMBL 21 file 2.4GB, compared to less than 1.4GB for O...