Skip to main content

Posts

ChEMBL 20 Released

We are pleased to announce the release of ChEMBL_20. This version of the database was prepared on 14th January 2015 and contains: 1,715,135 compound records 1,463,270 compounds (of which 1,456,020 have mol files) 13,520,737 activities 1,148,942 assays 10,774 targets 59,610 source documents You can query the ChEMBL 20 data online via the ChEMBL Interface and you can also download the data from the ChEMBL ftpsite . Please see ChEMBL_20 release notes for full details of all changes in this release.   Changes since the last release In addition to the regular updates to the Scientific Literature, PubChem, FDA Orange Book and USP Dictionary of USAN and INN Investigational Drug Names this release of ChEMBL also includes the following new datasets:    AstraZeneca in-vitro DMPK and physicochemical properties AstraZeneca have provided  experimental data on a set of publicly disclosed compounds in the following ADMET related assays : pKa, lipophilic...

The ChEMBL Roadshow: Part II

After the very successful US East Coast ChEMBL Roadshow, we (Anne and George) will be on the road once again next week to spread the word on ChEMBL , myChEMBL and SureChEMBL . This time we will visit the US West Coast and specifically these venues: Tuesday 27th Jan: University of New Mexico, Albuquerque. Wednesday 28th: MolSoft/UCSD/Scripps, San Diego. See also here . Thursday 29th: IBM Research Centre, San Jose. Friday 30th: UCSF, San Francisco. If you are nearby and would like to attend or meet us for a chat, please get in touch.  We are grateful to SMSdrug.net for funding.  George & Anne

ChEMBL 20 schema

For those who just can't wait to update their code... Here is a picture of the new ChEMBL_20 schema.

ChEMBL 20 coming soon...

Happy New Year for 2015 from the ChEMBL group! Release 20 of the ChEMBL database will be happening around the end of the month, and for those who can't wait, here's a preview of the exciting new features you can expect to find there: HELM notation - we have developed an implementation of the Pistoia Alliance's HELM standard for biotherapeutics and will be supplying HELM notation for just under 20K peptides (previously represented by mol files). We will also make our monomer library available in case others wish to use it to generate their own HELM notation. Structural alerts - in place of the old 'Med Chem Friendly' flag used in ChEMBL, we now have an extensive set of structural alerts calculated for the ChEMBL compounds. The data set includes eight different sets of alerts (including sets published by Pfizer, Glaxo, BMS, University of Dundee, NIH MLSMR and PAINS filters) providing more than 1100 distinct SMARTS. Alerts found for a given compound can be vie...

Accessing web services with cURL

ChEMBL web services are really friendly. We provide live online documentation , support for CORS and JSONP techniques to support web developers in creating their own web widgets. For Python developers, we provide dedicated client library as well as examples using the client and well known requests library in a form of ipython notebook . There are also examples for Java and Perl, you can find it here . But this is nothing for real UNIX/Linux hackers. Real hackers use cURL . And there is a good reason to do so. cURL comes preinstalled on many Linux distributions as well as OSX. It follows Unix philosophy and can be joined with other tools using pipes . Finally, it can be used inside bash scripts which is very useful for automating tasks. Unfortunately first experiences with cURL can be frustrating. For example, after studying cURL manual pages , one may think that following will return set of compounds in json format: But the result is quite dissapointing... The reason is...

Finding key compounds in med. chemistry patents: The open way

A couple of us attended the 3rd RDKit UGM , hosted by Merck in Darmstadt this year. It was an excellent opportunity to catch up with RDKit developments and applications and meet up with other loyal "RDKitters". I presented a talk-torial there and went through an IPython Notebook, which some of you may find useful. It uses patent chemistry data extracted from SureChEMBL and after a series of filtering steps, it follows a few "traditional" chemoinformatics approaches with a set of claimed compounds. My ultimate aim was to identify "key compounds" in patents using compound information alone, inspired by papers such as this and this . The crucial difference is that these authors used commercial data and software, where in this implementation everything is free and open. At the same time, I wanted to show off what the combination of pandas, scikit-learn, mpld3, Beaker, RDKit, IPython Notebook and SureChEMBL can do nowadays (hint: a lot).  So, ...

Using ChEMBL web services via proxy.

It is common practice for organizations and companies to make use of proxy servers to connect to services outside their network. This can cause problems for users of the ChEMBL web services who sit behind a proxy server. So to help those users who have asked, we provide the following quick guide, which demonstrates how to access ChEMBL web services via a proxy. Most software libraries respect proxy settings from environmental variables. You can set the proxy variable once, normally HTTP_PROXY and then use that variable to set other related proxy environment variables: Or if you have different proxies responsible for different protocols: On Windows, this would be: If you are accessing the ChEMBL web services programmatically and you prefer not to clutter your environment, you can consider adding the proxy settings to your scripts. Here are some python based recipes: 1. Official ChEMBL client library If you are working in a python based environment, we recommend...