ChEMBL 24 Released!

We are pleased to announce the release of ChEMBL 24. This version of the database, prepared on 23/04/2018 contains:     2,275,906 compound records     1,828,820 compounds (of which 1,820,035 have mol files)     15,207,914 activities     1,060,283 assays     12,091 targets     69,861 documents Data can be downloaded from the ChEMBL ftp site: Please see ChEMBL_24 release notes for full details of all changes in this release: Change in data model and addition of activity properties and supplementary data: A new data submission format and database loader has been implemented. The new deposition system allows more advanced functionality, including the ability to update previously deposited data sets, and the ability to deposit activity data against existing ChEMBL compound or assay collections. This mean

Striving for Perfect Representation of Chemical Structures – is this possible?

It probably goes without saying that at ChEMBL, we have a desire to make all our data as accurate and useful as possible. With this in mind we have spent many hours over the last few years trying to curate, in particular, the structures of marketed drugs and clinical candidates. We aren’t alone in this and more than 5 years ago people were coming across the same problems as highlighted in this blog post by ChemConnector on Fluvastatin Our drug curation is an ongoing and probably a never-ending task but to be honest it has proved a lot more difficult than we expected. This is for several reasons: Firstly, where to go to find the definitive structure of a molecule? One would have thought this would be easy but even the sources such as INN and USAN don’t always agree. For example for Telavancin the USAN_data_sheet  shows a difference in the nitrogen and carbon counts in the structure images compared with the images in the INN document (although the molecular formula are the s