Skip to main content


Showing posts from August, 2015

Online Resources

We would like to announce the recent release of two online training resources for ChEMBL and UniChem. For ChEMBL, we   have developed  ‘ChEMBL: Exploring bioactive drug-like molecules’ , which will walk you through how to use the interface, step-by-step. It tackles topics such as target searching, compound searching, web services and data downloads. The course also gives you a chance to test your knowledge throughout. Additionally, Jon Chambers has created the ' UniChem: Quick Tour ' course. This course will give users a basic understanding of UniChem and the benefits it can bring to navigating small molecule resources. It will also walk you through how to conduct simple searches using UniChem and the UniChem Connectivity Search feature. I'd also like to remind you that we store recordings of our past webinars, in case you missed them. You can access these anytime and they can be found here:

Accessing SureChEMBL data in bulk

It is the peak of the summer (at least in this hemisphere) and many of our readers/users will be on holiday, perhaps on an island enjoying the sea. Luckily, for the rest of us there is still the 'sea' of SureChEMBL data that awaits to be enjoyed and explored for hidden 'treasures' (let me know if I pushed this analogy too far). See here and  here for a reminder of SureChEMBL is and what it does.  This wealth of (big) data can be accessed via the SureChEMBL interface , where users can submit quite sophisticated and granular queries by combining: i) Lucene fields against full-text and bibliographic metadata and ii) advanced structure query features against the annotated compound corpus. Examples of such queries will be the topic of a future post. Once the search results are back, users can browse through and export the chemistry from the patent(s) of interest. In addition to this functionality, we've been receiving user requests for  local (behind the

LSH-based similarity search in MongoDB is faster than postgres cartridge.

TL;DR: In his excellent blog post , Matt Swain described the implementation of compound similarity searches in MongoDB . Unfortunately, Matt's approach had suboptimal ( polynomial ) time complexity with respect to decreasing similarity thresholds, which renders unsuitable for production environments. In this article, we improve on the method by enhancing it with Locality Sensitive Hashing algorithm, which significantly reduces query time and outperforms RDKit PostgreSQL cartridge . myChEMBL 21 - NoSQL edition    Given that NoSQL technologies applied to computational chemistry and cheminformatics are gaining traction and popularity, we decided to include a taster in future myChEMBL releases. Two especially appealing technologies are Neo4j and MongoDB . The former is a graph database and the latter is a BSON document storage. We would like to provide IPython notebook -based tutorials explaining how to use this software to deal with common cheminformatics p