The ChEMBL-og

Posts

Accessing web services with cURL

ChEMBL web services are really friendly. We provide live online documentation , support for CORS and JSONP techniques to support web developers in creating their own web widgets. For Python developers, we provide dedicated client library as well as examples using the client and well known requests library in a form of ipython notebook . There are also examples for Java and Perl, you can find it here . But this is nothing for real UNIX/Linux hackers. Real hackers use cURL . And there is a good reason to do so. cURL comes preinstalled on many Linux distributions as well as OSX. It follows Unix philosophy and can be joined with other tools using pipes . Finally, it can be used inside bash scripts which is very useful for automating tasks. Unfortunately first experiences with cURL can be frustrating. For example, after studying cURL manual pages , one may think that following will return set of compounds in json format: But the result is quite dissapointing... The reason is...

Finding key compounds in med. chemistry patents: The open way

A couple of us attended the 3rd RDKit UGM , hosted by Merck in Darmstadt this year. It was an excellent opportunity to catch up with RDKit developments and applications and meet up with other loyal "RDKitters". I presented a talk-torial there and went through an IPython Notebook, which some of you may find useful. It uses patent chemistry data extracted from SureChEMBL and after a series of filtering steps, it follows a few "traditional" chemoinformatics approaches with a set of claimed compounds. My ultimate aim was to identify "key compounds" in patents using compound information alone, inspired by papers such as this and this . The crucial difference is that these authors used commercial data and software, where in this implementation everything is free and open. At the same time, I wanted to show off what the combination of pandas, scikit-learn, mpld3, Beaker, RDKit, IPython Notebook and SureChEMBL can do nowadays (hint: a lot). So, ...

Using ChEMBL web services via proxy.

It is common practice for organizations and companies to make use of proxy servers to connect to services outside their network. This can cause problems for users of the ChEMBL web services who sit behind a proxy server. So to help those users who have asked, we provide the following quick guide, which demonstrates how to access ChEMBL web services via a proxy. Most software libraries respect proxy settings from environmental variables. You can set the proxy variable once, normally HTTP_PROXY and then use that variable to set other related proxy environment variables: Or if you have different proxies responsible for different protocols: On Windows, this would be: If you are accessing the ChEMBL web services programmatically and you prefer not to clutter your environment, you can consider adding the proxy settings to your scripts. Here are some python based recipes: 1. Official ChEMBL client library If you are working in a python based environment, we recommend...

An overview and invitation to contribute to ChEMBL curation with PPDMs

PPDMs has been in the making for more than a year and is a follow-up on a conference paper we published in 2012. As in 2012, our objective is to map small molecule binding sites to protein domains, the structural units that form recurring building blocks in the evolution of proteins. An application note describing PPDMs is just out in Bioinformatics . Mapping small molecule binding to protein domains The mapping facilitates the functional interpretation of small molecule-protein interactions - if you understand which domain in a protein is targeted, you are in a better position to anticipate the downstream effect. Mapping small molecule binding to protein domains also provides a technical advantage to machine-learning approaches that incorporate protein sequence information as a descriptor to predict small molecule bioactivity. Reducing the sequence descriptor to the part that mediates small molecule binding increases the informative content of the descriptor. This is best e...

Paper: PPDMs – A resource for mapping small molecule bioactivities from ChEMBL to Pfam-A protein domains

We've just published a Open Access paper in Bioinformatics on an approach to annotate the region of ligand binding within a target protein. This has a lot of applications in the use of ChEMBL , in particular providing greater accuracy in mapping functional effects, improving ligand-based target prediction approaches, and reducing false positives in sequence/target searching of ChEMBL. Where next for this work - well annotating to a site-specific level would be a good thing to implement (think about HIV-1 RT with the distinct nucleoside and non-nucleoside sites). Here's the abstract... Summary : PPDMs is a resource that maps small molecule bioactivities to protein domains from the Pfam-A collection of protein families. Small molecule bioactivities mapped to protein domains add important precision to approaches that use protein sequence searches alignments to assist applications in computational drug discovery and systems and chemical biology. We have previously propos...

Django model describing ChEMBL database.

TL;DR: We have just open sourced our Django ORM Model, which describes the ChEMBL relational database schema. This means you no longer need to write another line of SQL code to interact with ChEMBL database. We think it is pretty cool and we are using it in the ChEMBL group to make our lives easier. Read on to find out more.... It is never a good idea to use SQL code directly in python. Let's see some basic examples explaining why: Can you see what is wrong with the code above? SQL keyword `JOIN` was misspelled as 'JION'. But it's hard to find it quickly because most of code highlighters will apply Python syntax rules and ignore contents of strings. In our case the string is very important as it contains SQL statement. The problem above can be easily solved using some simple Python SQL wrapper, such as edendb . This wrapper will provide set of functions to perform database operations for example 'select', 'insert', 'delete': No...

myChEMBL 19 Released

We are very pleased to announce that the latest myChEMBL release, based on the ChEMBL 19 database , is now available to download . In addition to the extra data, you will also find a number a great new features. So what's new then? More core chemoinformatics tools We have included OSRA (Optical Structure Recognition), which is useful for extracting compound structures from images. OSRA can be accessed from the command line or by very convenient web interface, provided by Beaker (described below). We've also added OpenBabel - another great open source cheminformatics toolkit. This means you can now experiment with both RDKit and OpenBabel and use whichever you prefer. ChEMBL Beaker myChEMBL now ships with a local instance the ChEMBL Beaker service. For those not familiar with Beaker, the service provides users with an array of chemoinformatics utilities via a RESTful API. Under the h...