Skip to main content

Posts

Showing posts from March, 2016

Target Prediction Models Update

In case you have been too busy to notice, ChEMBL_21 has arrived with the usual additions, improvements and enhancements both on the data/annotation side, as well as on the interface/services. To complement this, we have also updated the target prediction models , which can be downloaded from our ftp  here .  The good news is that, besides the increase in terms of training data (compounds and targets), the new models were built using the latest stable versions of RDKit ( 2015.09.2) and scikit-learn (0.17). The latter was upgraded from the much older 0.14 version, which was causing incompatibility issues (see  MultiLabelBinarizer ) to several of you while trying to use the models. We've also put together a quick Jupyter Notebook demo on how to get predictions from the models here:  https://github.com/madgpap/notebooks/blob/master/target_pred_21_demo.ipynb The new models will also be available on myChEMBL 21 along with a more detai...

This Python InChI Key resolver will blow your mind

This scientific clickbait title introduces our promised blog post about the integration of UniChem into our ChEMBL python client. UniChem is a very important resource, as it contains information about 134 million (and counting) unique compound structures and cross references between various chemistry resources. Since UniChem is developed in-house and provides its own web services , we thought it would make sense to integrate it with our python client library . Before we present a systematic translation between raw HTTP calls described in the UniChem API documentation and client calls, let us provide some preliminary information: In order to install the client, you should use pip : pip install -U chembl_webresource_client Once you have it installed, you can import the unichem module: from chembl_webresource_client.unichem import unichem_client as unichem OK, so how to resolve an InChI Key to InChI string? It's very simple: Of course in order to reso...

ChEMBL 21 web services update

Traditionally, along with the release of the new ChEMBL version , we have made a few updates to our RESTful API . Below you can find a short description of the most important changes:   Data API ( https://www.ebi.ac.uk/chembl/api/data/docs ): 1. New resources: Since ChEMBL 21 introduced a few new tables, we have made them available via the API. The new resources are: drug_indication go_slim metabolism Moreover, the target_component endpoint has been enhanced to provide a list of related GO terms. 2. Solr-based search : a very popular feature request was the ability to search resources by a keyword. A form of searching was already possible before, using filtering terms, such as [i]contains,[i]startswith and [i]endswtith filters. For example, in order to search molecules for 'metazide' in their preferred name, this filter can be used: api/data/molecule?pref_name__icontains=metazide However, this approach has many drawbacks: it's executed on th...

ChEMBL DB on SQLite, is that even possible?

Short answer: Yes; Andrew Dalke did it in 2014 for ChEMBL 19 compounds but now it's officially supported by the ChEMBL team and covers the whole database. One thing you can notice looking at the ChEMBL 21 FTP directory is a  new file called chembl_21_sqlite.tar.gz . What's that? It's a binary SQLite database file containing all the ChEMBL 21 tables and data. If you don't know what the SQLite is, it's a very lightweight database system, that stores the entire database (definitions, tables, indices, and the data itself) as a single cross-platform file on a host machine. It's very popular as well, so if you have a Mac, Windows 10 or a Linux box, chances are that SQLite is already installed on your computer. Skype uses SQLite to store the local copy of conversation history and the Python language has SQLite bundled as a core library . If it's so "lightweight", why is the SQLite ChEMBL 21 file 2.4GB, compared to less than 1.4GB for O...

ChEMBL 21 Released

We are pleased to announce the release of ChEMBL_21. This version of the database was prepared on 1st February 2016 and contains: • 1,929,473 compound records • 1,592,191 compounds (of which 1,583,897 have mol files) • 13,968,617 activities • 1,212,831 assays • 11,019 targets • 62,502 source documents Data can be downloaded from the ChEMBL ftpsite  or viewed via the ChEMBL interface . Please see  ChEMBL_21 release notes  for full details of all changes in this release. CHANGES SINCE THE LAST RELEASE In addition to the regular updates to the Scientific Literature, PubChem, FDA Orange Book and USP Dictionary of USAN and INN Investigational Drug Names this release of ChEMBL also includes the following new data: * Data Depositions Eight new deposited data sets have been included in ChEMBL_21. These include HepG2 cell viability data for the Gates Library Compound Collection from the University of Dundee, three depositions f...