Skip to main content

Posts

ChEMBL release 23, technical aspects.

ChEMBL release 23, technical aspects. In this blog post, we would like to highlight some important technical improvements we've deployed as a part of the ChEMBL 23 release. You may find them useful if you work with ChEMBL data using FTP downloads and API. 1. FPS format support. Many users download our SDF file containing all ChEMBL structures in order to compute fingerprints as an immediate next step. We decided to help them and publish precomputed fingerprints in a FPS text fingerprint format . The FPS format was developed by Andrew Dalke to "define and promote common file formats for storing and exchanging cheminformatics fingerprint data sets". It is used by chemfp , RDKit , OpenBabel and CACTVS and we believe it deserves promotion. The computed fingerprints are 2048 bit radius 2 morgan FPs , which we think is the most popular and generic type but please let us know in comments if other type can serve better. We are fully aware that fingerprint typ...

Post-doctoral positions

Two exciting post-doctoral projects are available via the ESPOD and EBPOD schemes between the European Bioinformatics Institute and respectively the Sanger Institute and the NIHR Cambridge Biomedical Research Centre (BRC). Post-doctoral fellows appointed via these schemes work on projects under the joint supervision of faculty members from EMBL-EBI and the Sanger or BRC as appropriate. Specifically: (a) In collaboration with Mathew Garnet at the Sanger Institute, a project to exploit the potential of combining large-scale drug sensitivity screening platforms with the chemogenomics resources and expertise at the EBI. A full description of the project can be found here:  http://www.ebi.ac.uk/sites/ebi.ac.uk/files/groups/research_office/ESPOD2017/05%20Leach-Garnett.pdf . Applications can be made via the relevant link here:  http://www.ebi.ac.uk/research/postdocs/espods (b) In collaboration with Vasilis Kosmoliaptsis in the Department of Surgery at Addenbrooke’s hospi...

ChEMBL_23 released

We are pleased to announce the release of ChEMBL_23. This release was prepared on 1st May 2017 and contains: * 2,101,843 compound records * 1,735,442 compounds (of which 1,727,112 have mol files) * 14,675,320 activities * 1,302,147 assays * 11,538 targets * 67,722 source documents Data can be downloaded from the ChEMBL ftp site:  ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_23 Please see ChEMBL_23 release notes for full details of all changes in this release:  ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_23/chembl_23_release_notes.txt DATA CHANGES SINCE THE LAST RELEASE In addition to the regular updates to the Scientific Literature, FDA Orange Book and USP Dictionary of USAN and INN Investigational Drug Names and Clinical Candidates, this release of ChEMBL also includes the following new data: Patent Bioactivity Data With funding from the NIH Illuminating the Druggable Genome project (h...

Technical internships at ChEMBL

Technical internships at ChEMBL. We are looking for skilled Computer Science (and related fields) students with strong programming skills to join our team for 3-6 month internships . This is not necessarily a summer internship program, you can start whenever convenient for you after being accepted. Please take a look at some of the research ideas / candidate profiles below: 1. Java programmer -  we are looking for a person with experience in Java to develop a prototype of new KNIME nodes for interacting with the ChEMBL API . Experience with REST and/or KNIME is a plus but not a requirement - you can learn it during your internship. A very important thing to note that you should be excited about UX and creating user-friendly and pragmatic GUI s. 2. C++ programmer - we would like to invite a person passionate about C++ and pattern recognition / image processing to experiment with optimising the open-source OSRA code. OSRA is like OCR but for molecules. We want to ...

Finding Compounds in Databases using UniChem

Have you ever identified an interesting compound and wondered what else is known about it?   For example is there any bioactivity data on it in ChEMBL or PubChem?   Is there any toxicity data on it (CompTox)?   Then having found interesting data on a compound wondered if it can be purchased or whether it has been patented.   All this can be done using UniChem.   Interested?    Come along to our webinar on 29th March at 2pm BST  (3pm CEST, 9am EDT) You will however need to register by emailing chembl-help . Places are limited so please let us know as soon as possible if you register but are then unable to attend. If you want to know more about UniChem please read on. UniChem ( https://www.ebi.ac.uk/unichem/  is a simple system we have developed to cross-reference compounds across databases both internal to EMBL-EBI and externally. Currently we have cross-references to 140 million compounds in 30 different databases. Information about t...