Skip to main content

SQLite versions now available for all ChEMBL releases

Last year, when I wanted to look at the evolution of ChEMBL over time I found it quite tricky. There were 34 releases at that point, but there was only one database format which was available for all 34 versions - MySQL. But downloading and installing all of these versions was painful as the contents of the .tar.gz are mostly but not exactly consistent, and the appropriate install command has changed a bit over the years.

At the same time, since ChEMBL 19 we have provided SQLite versions of ChEMBL. This format is one of the few recommended formats for datasets as specified by the US Library of Congress (alongside JSON, CSV and XML). SQLite is an inherently simpler database format to deal with as it doesn't require the user to setup a server and import the database; rather we provide a .sqlite file which the user can use straightaway after unzipping the .tar.gz. A member of our community, Charles Tapley Hoyt, has gone further and built on this with the ChEMBL Downloader project which manages the download and unzipping of the .sqlite file. This supports reproducible analyses of ChEMBL.

We have now generated SQLite versions for all ChEMBL releases - that is, from version 1 onwards. Version 1 of ChEMBL was released in 2009, while SQLite3 (still the current version) was released in 2004 so we haven't upset the space-time continuum, merely filled a gap. Having these versions available will make it simpler to carry out analyses across ChEMBL versions (such as the scaffold analysis below), and perhaps also aid the preservation of these early datasets. Charles has already updated the ChEMBL Downloader to support these versions and looked at changes over time, so check them out and take a trip back to those early days of ChEMBL.

Comments