Skip to main content

Posts

Finding key compounds in med. chemistry patents: The open way

A couple of us attended the 3rd RDKit UGM , hosted by Merck in Darmstadt this year. It was an excellent opportunity to catch up with RDKit developments and applications and meet up with other loyal "RDKitters". I presented a talk-torial there and went through an IPython Notebook, which some of you may find useful. It uses patent chemistry data extracted from SureChEMBL and after a series of filtering steps, it follows a few "traditional" chemoinformatics approaches with a set of claimed compounds. My ultimate aim was to identify "key compounds" in patents using compound information alone, inspired by papers such as this and this . The crucial difference is that these authors used commercial data and software, where in this implementation everything is free and open. At the same time, I wanted to show off what the combination of pandas, scikit-learn, mpld3, Beaker, RDKit, IPython Notebook and SureChEMBL can do nowadays (hint: a lot).  So, ...

Using ChEMBL web services via proxy.

It is common practice for organizations and companies to make use of proxy servers to connect to services outside their network. This can cause problems for users of the ChEMBL web services who sit behind a proxy server. So to help those users who have asked, we provide the following quick guide, which demonstrates how to access ChEMBL web services via a proxy. Most software libraries respect proxy settings from environmental variables. You can set the proxy variable once, normally HTTP_PROXY and then use that variable to set other related proxy environment variables: Or if you have different proxies responsible for different protocols: On Windows, this would be: If you are accessing the ChEMBL web services programmatically and you prefer not to clutter your environment, you can consider adding the proxy settings to your scripts. Here are some python based recipes: 1. Official ChEMBL client library If you are working in a python based environment, we recommend...

An overview and invitation to contribute to ChEMBL curation with PPDMs

PPDMs has been in the making for more than a year and is a follow-up on a conference paper we published in 2012. As in 2012, our objective is to map small molecule binding sites to protein domains, the structural units that form recurring building blocks in the evolution of proteins. An application note describing PPDMs is just out in Bioinformatics . Mapping small molecule binding to protein domains The mapping facilitates the functional interpretation of small molecule-protein interactions - if you understand which domain in a protein is targeted, you are in a better position to anticipate the downstream effect.  Mapping small molecule binding to protein domains also provides a technical advantage to machine-learning approaches that incorporate protein sequence information as a descriptor to predict small molecule bioactivity. Reducing the sequence descriptor to the part that mediates small molecule binding increases the informative content of the descriptor. This is best e...

Paper: PPDMs – A resource for mapping small molecule bioactivities from ChEMBL to Pfam-A protein domains

We've just published a Open Access paper in Bioinformatics on an approach to annotate the region of ligand binding within a target protein. This has a lot of applications in the use of ChEMBL , in particular providing greater accuracy in mapping functional effects, improving ligand-based target prediction approaches, and reducing false positives in sequence/target searching of ChEMBL. Where next for this work - well annotating to a site-specific level would be a good thing to implement (think about HIV-1 RT with the distinct nucleoside and non-nucleoside sites). Here's the abstract... Summary : PPDMs is a resource that maps small molecule bioactivities to protein domains from the Pfam-A collection of protein families. Small molecule bioactivities mapped to protein domains add important precision to approaches that use protein sequence searches alignments to assist applications in computational drug discovery and systems and chemical biology. We have previously propos...

Django model describing ChEMBL database.

TL;DR: We have just open sourced our Django ORM Model, which describes the ChEMBL relational database schema. This means you no longer need to write another line of SQL code to interact with ChEMBL database. We think it is pretty cool and we are using it in the ChEMBL group to make our lives easier. Read on to find out more.... It is never a good idea to use SQL code directly in python. Let's see some basic examples explaining why: Can you see what is wrong with the code above? SQL keyword `JOIN` was misspelled as 'JION'. But it's hard to find it quickly because most of code highlighters will apply Python syntax rules and ignore contents of strings. In our case the string is very important as it contains SQL statement. The problem above can be easily solved using some simple Python SQL wrapper, such as edendb . This wrapper will provide set of functions to perform database operations for example 'select', 'insert', 'delete': No...

myChEMBL 19 Released

                      We are very pleased to announce that the latest myChEMBL release, based on the ChEMBL 19 database ,  is now available to download . In addition to the extra data, you will also find a number a great new features. So what's new then? More core chemoinformatics tools We have included OSRA (Optical Structure Recognition), which is useful for extracting compound structures from images. OSRA can be accessed from the command line or by very convenient web interface, provided by Beaker (described below). We've also added OpenBabel - another great open source cheminformatics toolkit. This means you can now experiment with both RDKit and OpenBabel and use whichever you prefer. ChEMBL Beaker myChEMBL now ships with a local instance the ChEMBL Beaker service. For those not familiar with Beaker, the service provides users with an array of chemoinformatics utilities via a RESTful API. Under the h...

New Drug Approvals 2014 - Pt. XII - Naloxegol (Movantik™)

ATC Code: A06AH03 Wikipedia:  Naloxegol ChEMBL:  CHEMBL2219418 On September 16th  FDA approved  Movantik (naloxegol, AZ-13337019 ), as an oral treatment for patients with opioid-induced constipation and chronic non-cancer pain. Naloxegol Naloxegol is an opioid receptor antagonist .  Due to its similarity to noroxymorphone, a main metabolite of oxycodone , naloxegol is classed as a controlled substance. However, the FDA analysed its abuse potential and concluded that there was no risk of dependency. Mode of Action Opioids are a class of drugs which are used to manage pain, but have a common side effect of reducing the motility of the gastrointestinal tract, making bowel movements difficult.  Opioids work by binding to the mu-receptors ( CHEMBL233 , UniProt:P35372 ) in the central nervous system, thereby reducing pain. However, they are also able to bind to the mu-receptors in the gastrointestinal tract, hence causing op...