Skip to main content


Showing posts from 2015

SureChEMBL: A New Hope

US-D254080-S SureChEMBL has disrupted the field of patent chemistry by liberating chemical structures and knowledge locked in text and images, and by making the compound-patent associations freely  and fully searchable and accessible on a daily basis to everyone: academics, IP professionals, content providers, software vendors, biotechs, small and big pharma, and related chemical industries . The speed, scale and scope of the data is unprecedented for a public resource.  SureChEMBL has been around for less than two years ; during this time, it has evolved into a full-blown chemistry resource provided by the EMBL-EBI: the SureChEMBL interface was revamped and released last year , including combined keyword and structure-based queries against the annotated patent corpus. All chemistry is integrated with UniChem and there are several ways to access the data in bulk, including flat files and a data client. Very soon, the data will be fully integrated and available via the

Advanced keyword and structure searches with SureChEMBL

Previously in the SureChEMBL series, we described how to access SureChEMBL data in bulk , offline and locally. So, you may ask, what is the point in using the SureChEMBL web interface ? Well, how about the unprecedented functionality that allows you to submit very granular queries by combining: i) Lucene fields against full-text and bibliographic metadata and ii) advanced structure query features against the annotated compound corpus - at the same time? Let’s see each one separately first: Lucene-powered keyword searching You may use the main text box for simple keyword-based patent searches, such as ‘Apple’, ‘diabetes’ or even ' chocolate cake ' (the patent corpus as a recipe book is a new use-case here). You will get a lot of results and probably a lot of noise. With Lucene fields, you can slice and dice a query by indicating specific patent sections and bibliographic metadata, such as date/year of filing or publication, assignee, patent classification code,

Is ChEMBL down or is it just me?

Have you ever wondered whether your favorite resource of bioactive molecules data is down or there is some temporary network issue, that makes it unavailable from your end? There are many online tools, that can help in such cases (for example or similar websites). We, however, provide now a much better solution: ChEMBL status page : As you may notice, the status page is hosted on GitHub , so it is outside of the EBI infrastructure. This means that even when ChEMBL core websites are down, you should still be able to see the status page (assuming that GitHub is online, which is a quite reasonable assumption , despite occasional incidents ). We've placed a link to the status page at the bottom of the left-side navigation menu on the main ChEMBL web page , as it provides some useful information even when everything is fine. The status page presents information about the health of ChEMBL's most critical

Paper: Managing expectations: assessment of chemistry databases generated by automated extraction of chemical structures from patents

Our collaborators in GSK have just published an Open Access paper in the Journal of Cheminformatics . It is a comparative study of the quality of chemistry extraction from patent documents and includes patent chemistry sources derived by automated text-mining, such as SureChEMBL and the IBM/NIH data set . Among other things, the paper provides a useful detailed overview of SureChEMBL's chemistry annotation specifications. While conducting this study, we realised that this task is far from trivial for several reasons:  The patent corpus is inherently noisy, ambiguous and error-rich. There are diverse use cases and accuracy expectations when it comes to chemistry extracted from patents. Not all the chemistry found in a patent document is of equal importance. Compound standardisation variants such as stereoisomers, tautomers, salts and mixtures is always an issue. There is a distinct lack of an open Gold Standard when it comes to standardised chemistry extracted fro

Wanted - Web Developer!

Just a reminder that we are currently looking for a Web Application Developer to join the ChEMBL team at EMBL-EBI. The closing date for this vacancy is 4th Oct, so hurry and apply! The role is primarily to develop a series of web-based applications and interfaces for the ChEMBL chemogenomic resources. The role also involves the development, maintenance and documentation of these tools, and supporting their usage within the EBI and externally. It will also involve some requirement gathering and use-case development. Experience of Python and JavaScript is required as is experience of working with web frameworks such as Django. A sound knowledge of relational databases (primarily Oracle), SQL PL/SQL, REST and HTTP protocol is also a requirement. Experience of contributing to open software projects and documenting them on GitHub for example is desirable. Applicants should have a good understanding of best practice in software engineering, rapid development cycle work, have

Blast from the past - 1000th blog post!

To celebrate the 1000th post, we've decided to take a journey back in time. So, what you see above is a timeline* showing the most important blog posts published on the ChEMBL blog. The posts delineate major events and milestones in the group’s 7-year history and highlight the contributions and impact to the community. Posts on ChEMBL updates, publications, innovative software applications and popular resources are all included there. We hope you will enjoy skimming through it as much as we did. If you have any favourite blog post published here, let us know in the comments. Just remember that this journey continues; here’s to the next chiliad of exciting blog posts! The ChEMBL team. * The timeline was prepared using the excellent timelineJS library by .

KNIME chemoinformatics meetup at the EBI

We’re co-organising a KNIME chemoinformatics workshop at the EBI on Monday 5th October. This is a regular meeting that takes place the day before the biannual  UK-QSAR meeting . There will be informal discussions on the current and future state of the KNIME chemoinformatics nodes , along with updates by the community and the KNIME guys. There will also be talks on the integration of KNIME with the ChEMBL resources and the Open PHACTS platform.   More details and agenda here ; to register, fill in your details here . George 999

Online Resources

We would like to announce the recent release of two online training resources for ChEMBL and UniChem. For ChEMBL, we   have developed  ‘ChEMBL: Exploring bioactive drug-like molecules’ , which will walk you through how to use the interface, step-by-step. It tackles topics such as target searching, compound searching, web services and data downloads. The course also gives you a chance to test your knowledge throughout. Additionally, Jon Chambers has created the ' UniChem: Quick Tour ' course. This course will give users a basic understanding of UniChem and the benefits it can bring to navigating small molecule resources. It will also walk you through how to conduct simple searches using UniChem and the UniChem Connectivity Search feature. I'd also like to remind you that we store recordings of our past webinars, in case you missed them. You can access these anytime and they can be found here: