Skip to main content

Posts

Showing posts from August, 2015

Online Resources

We would like to announce the recent release of two online training resources for ChEMBL and UniChem.

Accessing SureChEMBL data in bulk

It is the peak of the summer (at least in this hemisphere) and many of our readers/users will be on holiday, perhaps on an island enjoying the sea. Luckily, for the rest of us there is still the 'sea' of SureChEMBL data that awaits to be enjoyed and explored for hidden 'treasures' (let me know if I pushed this analogy too far). See here and here for a reminder of SureChEMBL is and what it does. 
This wealth of (big) data can be accessed via the SureChEMBL interface, where users can submit quite sophisticated and granular queries by combining: i) Lucene fields against full-text and bibliographic metadata and ii) advanced structure query features against the annotated compound corpus. Examples of such queries will be the topic of a future post. Once the search results are back, users can browse through and export the chemistry from the patent(s) of interest. In addition to this functionality, we've been receiving user requests for local (behind the firewall), batch ac…

LSH-based similarity search in MongoDB is faster than postgres cartridge.

TL;DR: In his excellent blog post, Matt Swain described the implementation of compound similarity searches in MongoDB. Unfortunately, Matt's approach had suboptimal (polynomial) time complexity with respect to decreasing similarity thresholds, which renders unsuitable for production environments. In this article, we improve on the method by enhancing it with Locality Sensitive Hashing algorithm, which significantly reduces query time and outperforms RDKit PostgreSQL cartridge.

myChEMBL 21 - NoSQL edition  Given that NoSQL technologies applied to computational chemistry and cheminformatics are gaining traction and popularity, we decided to include a taster in future myChEMBL releases. Two especially appealing technologies are Neo4j and MongoDB. The former is a graph database and the latter is a BSON document storage. We would like to provide IPython notebook-based tutorials explaining how to use this software to deal with common cheminformatics problems.
Matt Swain's article &qu…