Skip to main content

Posts

Compound popularity contest

Have you ever wondered which compound is the most popular in ChEMBL? And by popular I don't mean the one which cracks the best jokes at dinner parties; I mean the compound with the largest number of structural analogues or nearest neighbours (NNs). This number also gives an indication of the sparsity or density of the chemical space around a compound and is a useful concept during hit expansion and lead optimisation.  This number of course depends on the fingerprint, the hashing and folding parameters, the similarity coefficient and the threshold. So let's say 2048-bit RDKit Morgan fingerprints with a radius of 2 or 3 (equivalent to ECFP_4 or ECFP_6) and Tanimoto threshold of 0.5. Why so low threshold? For an explanation, see here and here . To calculate this compound 'popularity', one would need to calculate the full similarity matrix of the 1.4M compounds in ChEMBL. This used to be prohibitively computationally expensive just a few years ago; nowadays,...

What's Going On?

I’ve been asked a lot by mail recently ‘What’s Going On?’ Well, here is are some facts and some emotion. So today is my last day at work here at EMBL-EBI. It’s been a fun and thrilling ride (for me at least), I’ve made lots of new friends, living life as an Open Data advocate and academic researcher, and most importantly having the privilege to lead the team here responsible for the ChEMBL database. It had been a long-term goal of mine to unlock large-scale bioactivity data from proprietary data silos and eye-wateringly expensive paywalls; so as US President George Bush famously said ‘Mission Accomplished!’. The impact of ChEMBL on academia, SMEs and large pharma has been great - and you can see the impact in new method development, but more importantly in potential new future drugs. My personal indebtedness to the Wellcome Trust for their support is immeasurable. An additional big shout out to Digital Science for their vision in  donating  the SureChEMBL platform to...

Easter egg hunt results.

As promised, we would like to provide the answer to the Easter egg hunt competition and announce the winner. Exactly seven hours after publishing the blog post we received a comment with the correct answer. The author of the comment was Matt Swain, who runs his blog about cheminformatics. You can verify the correctness of his answer by visiting the password protected link in the original post. The password is: fu']zxp+Wm[Kc3-N Congratulations to Matt! ChEMBL team

Upcoming Webinars

We are pleased to announce a new round of resource-specific webinars  that will be given in May and June 2015. These four webinars will cover UniChem , ChEMBL , MyChEMBL (the ChEMBL Virtual Machine) and the ChEMBL Web Services . UniChem , 4pm BST, 13th May 2015 Register Here ChEMBL Walkthrough , 4pm BST, 20th May 2015, Register Here MyChEMBL Walkthrough , 4pm BST, 10th June 2015, Register Here ChEMBL Web Services , 4pm BST, 17th June 2015,    Currently Postponed For those of you who can't make these days/times, each 1 hour long webinar will be video recorded and will be available to watch on YouTube afterwards. Additionally, we will make the slides available for download. The video for last month's  SureChEMBL  webinar can be found here (and part 2 here ).  For more information about the webinars, or to suggest other topics to cover, please contact chembl-help@ebi.ac.uk.

Easter egg hunt

Easter is coming and for all those, who don't know what to do with their spare time and fancy entering a little competition, we've prepared a small challenge. Easter Egg? In software development, an Easter egg is funny (but harmless) and undocumented feature hidden from users in unusual places. Excel 97 has its Flight Simulator , FireFox about:robots address and Debian's apt-get has a moo command. The ChEMBL web services has now joined this list and we invite you to find its hidden feature and share with others. But why? We would like to encourage you to look at the source code of our web services.  Reading code is essential developer skill, as it helps in understanding how the code works. This can lead to the development of new software and/or improve an existing codebase. After skimming through the code, hopefully you will agree that it is well written and easy to extend. Let us know if you disagree, either by emailing us or creating a GitHub issue . We p...

The SureChEMBL map file is out

As many of you know, SureChEMBL taps into the wealth of knowledge hidden in the patent documents. More specifically, SureChEMBL extracts and indexes chemistry from the full-text patent corpus (EPO, WIPO and USPTO; JPO titles and abstracts only) by means of automated text- and image-mining, on a daily basis. We have recently hosted a webinar about it which turned out to be very popular - for those who missed it, the video and slides are here . Besides the interface, SureChEMBL compound data can be accessed in various ways, such as UniChem and PubChem . The full compound dump is also available as a flat file download from our ftp server . Since the release of the SureChEMBL interface last September, we have received numerous requests for a way to access compound and patent data in a batch way. Typical use-cases would include retrieving all compounds for a list of patent IDs, or vice versa , retrieving all patents where one or more compounds have been extracted from. As a...

Beaker now officially part of ChEMBL web services

  We have mentioned Beaker (a.k.a the ChEMBL cheminformatics utility web service), several times on the blog ( here , here and here ), but have not devoted an entire post to Beaker. Well, here it is. Beaker - what's this? It's a small utility, that makes chemistry software available securely over https . You no longer need to install a chemical toolkit in order to convert your molfile to SMILES or calculate descriptors. If you have an internet connection (if you can read this, chances are you do), you can use Beaker. We recommend you head over to the interactive online documentation ( https://www.ebi.ac.uk/chembl/api/utils/docs ), to see the full list of functionality it offers and try it with your own data. Which toolkits are used by Beaker? Under-the-hood Beaker is exposing the functionality of the RDKit cheminformatics library. Beaker's optical structure recognition methods use the OSRA library.   Do I need an API Key? As long as you are m...