Skip to main content

Posts

Showing posts from 2017

Using autoencoders for molecule generation

Some time ago we found the following paper https://arxiv.org/abs/1610.02415 so we decided to take a look at it and train the described model using ChEMBL.Lucky us, we also found two open source implementations of the model; the original authors one https://github.com/HIPS/molecule-autoencoder and https://github.com/maxhodak/keras-molecules. We decided to rely on the last one as the original author states that it might be easier to have greater success using it.
What is the paper about? It describes how molecules can be generated and specifically designed using autoencoders.
First of all we are going to give some simple and not very technical introduction for those that are not familiar with autoencoders and then go through a ipython notebook showing few examples of how to use it.
Autoencoder introduction
Autoencoders are one of the many different and popular unsupervised deep learning algorithms used nowadays for many different fields and purposes. These work with two joint main blocks, a…

ChEMBL web services webinar 4pm 12th July

We are pleased to announce the next webinar in our ChEMBL webinar series:

ChEMBL, programmatically (part of the EMBL-EBI, programmatically: take a REST from manual searches webinar series) will be held at 4pm (BST) on 12th July.

The webinar will provide an overview of the ChEMBL API and its use, including how to execute API calls from the browser; where to find documentation; how to user filtering and pagination; available output formats; and scripting examples in Python, Bash and R.  We will also give examples of how the API can be used to create reusable web components and integrated into tools such as KNIME and Slack. The webinar will assume a degree of familiarity with the data in ChEMBL, so new users are advised that an introductory ChEMBL webinar is also available: https://www.ebi.ac.uk/training/online/course/chembl-walkthrough-webinar

To register for this webinar, please see http://ow.ly/hnxn30dlyMc

The ChEMBL Team

ChEMBL release 23, technical aspects.

ChEMBL release 23, technical aspects.

In this blog post, we would like to highlight some important technical improvements we've deployed as a part of the ChEMBL 23 release. You may find them useful if you work with ChEMBL data using FTP downloads and API.
1. FPS format support.
Many users download our SDF file containing all ChEMBL structures in order to compute fingerprints as an immediate next step. We decided to help them and publish precomputed fingerprints in a FPS text fingerprint format. The FPS format was developed by Andrew Dalke to "define and promote common file formats for storing and exchanging cheminformatics fingerprint data sets". It is used by chemfp, RDKit, OpenBabel and CACTVS and we believe it deserves promotion. The computed fingerprints are 2048 bit radius 2 morgan FPs, which we think is the most popular and generic type but please let us know in comments if other type can serve better. We are fully aware that fingerprint type can heavily depend on the…

Post-doctoral positions

Two exciting post-doctoral projects are available via the ESPOD and EBPOD schemes between the European Bioinformatics Institute and respectively the Sanger Institute and the NIHR Cambridge Biomedical Research Centre (BRC). Post-doctoral fellows appointed via these schemes work on projects under the joint supervision of faculty members from EMBL-EBI and the Sanger or BRC as appropriate. Specifically:
(a) In collaboration with Mathew Garnet at the Sanger Institute, a project to exploit the potential of combining large-scale drug sensitivity screening platforms with the chemogenomics resources and expertise at the EBI. A full description of the project can be found here: http://www.ebi.ac.uk/sites/ebi.ac.uk/files/groups/research_office/ESPOD2017/05%20Leach-Garnett.pdf. Applications can be made via the relevant link here: http://www.ebi.ac.uk/research/postdocs/espods
(b) In collaboration with Vasilis Kosmoliaptsis in the Department of Surgery at Addenbrooke’s hospital, to capitalize on o…

ChEMBL_23 released

We are pleased to announce the release of ChEMBL_23. This release was prepared on 1st May 2017 and contains:
* 2,101,843 compound records * 1,735,442 compounds (of which 1,727,112 have mol files) * 14,675,320 activities * 1,302,147 assays * 11,538 targets * 67,722 source documents
Data can be downloaded from the ChEMBL ftp site: ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_23
Please see ChEMBL_23 release notes for full details of all changes in this release: ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_23/chembl_23_release_notes.txt

DATA CHANGES SINCE THE LAST RELEASE
In addition to the regular updates to the Scientific Literature, FDA Orange Book and USP Dictionary of USAN and INN Investigational Drug Names and Clinical Candidates, this release of ChEMBL also includes the following new data:
Patent Bioactivity Data With funding from the NIH Illuminating the Druggable Genome project (https://commonfund.nih.gov/idg), we have extracted bioactivit…

Technical internships at ChEMBL

Technical internships at ChEMBL.

We are looking for skilled Computer Science (and related fields) students with strong programming skills to join our team for 3-6 month internships. This is not necessarily a summer internship program, you can start whenever convenient for you after being accepted. Please take a look at some of the research ideas / candidate profiles below:
1. Java programmer -  we are looking for a person with experience in Java to develop a prototype of new KNIME nodes for interacting with the ChEMBL API. Experience with REST and/or KNIME is a plus but not a requirement - you can learn it during your internship. A very important thing to note that you should be excited about UX and creating user-friendly and pragmatic GUIs.
2. C++ programmer - we would like to invite a person passionate about C++ and pattern recognition / image processing to experiment with optimising the open-source OSRA code. OSRA is like OCR but for molecules. We want to make it faster and more accur…