In case you have been too busy to notice, ChEMBL_21 has arrived with the usual additions, improvements and enhancements both on the data/annotation side, as well as on the interface/services. To complement this, we have also updated the target prediction models, which can be downloaded from our ftp here.
The good news is that, besides the increase in terms of training data (compounds and targets), the new models were built using the latest stable versions of RDKit (2015.09.2) and scikit-learn (0.17). The latter was upgraded from the much older 0.14 version, which was causing incompatibility issues (see MultiLabelBinarizer) to several of you while trying to use the models.
We've also put together a quick Jupyter Notebook demo on how to get predictions from the models here:
The new models will also be available on myChEMBL 21 along with a more detailed and elaborate Jupyter Notebook.
On a side note, am I allowed to be impressed by how easy it is nowadays to install Python and RDKit?
It is literally just a matter of 6 commands and 5 minutes (on my Mac):
curl -o miniconda.sh http://repo.continuum.io/miniconda/Miniconda3-3.8.3-MacOSX-x86_64.sh
conda create -n rd27 python=2.7
source activate rd27
conda install ipython ipython-notebook pillow pandas requests
conda install -c https://conda.anaconda.org/rdkit rdkit
To put things in perspective: the first time I tried to compile Python and RDKit from scratch was in 2010 (on a RedHat 5.6 machine, of course); it took me about 4 days :)