Skip to main content

Tastypie & Chempi



One of the immediate consequences of refactoring our webservices using Django, Tastypie and related approaches (as described here) is that we can run them on almost any database backend. Django abstracts communication with database and using custom QueryManagers we were able to implement chemisty-specific opererations, such as substructure and similarity search in a database agnostic manner.

This means, that if we want, we can use only Open Source components (such as Postgres and RDKit), or elect to use optimised commercially sourced software as appropriate. However, what if we go one step further and try to use Open Hardware as well? This is exactly what we've just done! We managed to install full ChEMBL 17 on raspbery pi.

Some frequently asked questions (at lease those that have been asked internally) and technical details are below:

1. How much space does it take?

12 Gb, including OS, data and all relevant software. Unfortunately we a used 32 Gb SD card so this is size if you would like to use our cloned disk image.

EDIT: Compressed image takes 4.13 Gb.

2. What OS is it running?

Raspbian, free operating system based on Debian.

3. Is it slow?

We haven't make any benchmarks yet. Obviously it's slower than our online web services - but then it's a lot cheaper. On the other hand, performing some sample requests we can say that performance is certainly acceptable; and there is a lot room for improvements - raspberry pis can be easily overclocked from 700 MHz to 1GHz and according to some benchmarks this can give rise to doubling of application speed in some cases. The SD card we used is not the fastest one as well. Finally, all caching is disabled because we wanted to save disk space but using database caching from Django caching framework should give further major improvements - so maybe use the 32 Gb image after all.

Types of request that chempi can be slower on are:

 - Image generation, but if we replace image with JSON from which image can be generated using HTML5 canvas on the client side (the way we generated images in our game) it can be much faster. More about this topic in future blog post.
- Queries using aggregate functions such as COUNT (it seems that we need to optimise our postgres db by adding some more indexes).
- Substructure and similarity search - again, caching, over-clocking and some database and cartridge (choosing faster fingerprints) optimization should solve all the problems. "Premature optimization is a root of all evil", so we first wanted to have a proof of concept that just works, not necessarily works super fast.

4. Can I make my own chempi?

Yes, we are planning to share our SD card image, we will probably use BitTorrent protocol to do this due to image size, and some issues we have faced with distribution of the myChEMBL. We do remember that not everyone has mega-fast broadband!

5. Is chempi useful at all?

Although we think it is interesting as a proof of concept having chemical database on such small and open source hardware, we do think this may have some interesting future real-world applications:

 - plugging our chempi to local network makes it immediately accessible to other computers. So this is a zero configuration demonstration of ChEMBL.
- analogically to the thesis included in this paper, it can encourage cheminformatics education on low cost ARM hardware.
- raspberry can be easily enhanced with camera to perform image recognition. This, combined with software like OSRA can give ability so scan compound images and search them in database.
- adding some e-ink display (for example, jailbroken Kindle?) can produce very interesting small machine...

6. What are some of the technical details?

To deploy our webservices (which are just another Django application) we've used Gunicorn as a server, which in turn connects to NGINX via standard unix pipe. To make it work as a deamon and launch on machine startup, we've used Supervisor. We believe this is ideal way to deploy Django not only on raspberry but on all production machines to if you like to run chembl webservices locally in your company/academia we suggest to do it this way.


michal

Comments

Popular posts from this blog

SureChEMBL Available Now

Followers of the ChEMBL group's activities and this blog will be aware of our involvement in the migration of the previously commercially available SureChem chemistry patent system, to a new, free-for-all system, known as SureChEMBL. Today we are very pleased to announce that the migration process is complete and the SureChEMBL website is now online. SureChEMBL provides the research community with the ability to search the patent literature using Lucene-based keyword queries and, much more importantly, chemistry-based queries. If you are not familiar with SureChEMBL, we recommend you review the content of these earlier blogposts here and here . SureChEMBL is a live system, which is continuously extracting chemical entities from the patent literature. The time it takes for a new chemical in the patent literature to become searchable in the SureChEMBL system is 1-2 days (WO patents can sometimes take a bit longer due to an additional reprocessing step). At time of writi

New SureChEMBL announcement

(Generated with DALL-E 3 ∙ 30 October 2023 at 1:48 pm) We have some very exciting news to report: the new SureChEMBL is now available! Hooray! What is SureChEMBL, you may ask. Good question! In our portfolio of chemical biology services, alongside our established database of bioactivity data for drug-like molecules ChEMBL , our dictionary of annotated small molecule entities ChEBI , and our compound cross-referencing system UniChem , we also deliver a database of annotated patents! Almost 10 years ago , EMBL-EBI acquired the SureChem system of chemically annotated patents and made this freely accessible in the public domain as SureChEMBL. Since then, our team has continued to maintain and deliver SureChEMBL. However, this has become increasingly challenging due to the complexities of the underlying codebase. We were awarded a Wellcome Trust grant in 2021 to completely overhaul SureChEMBL, with a new UI, backend infrastructure, and new f

ChEMBL & SureChEMBL anniversary symposium

  In 2024 we celebrate the 15th anniversary of the first public release of the ChEMBL database as well as the 10th anniversary of SureChEMBL. To recognise this important landmark we are organising a two-day symposium to celebrate the work achieved by ChEMBL and SureChEMBL, and look forward to its future.   Save the date for the ChEMBL 15 Year Symposium October 1-2, 2024     Day one will consist of four workshops, a basic ChEMBL drug design workshop; an advanced ChEMBL workshop (EUbOPEN community workshop); a ChEMBL data deposition workshop; and a SureChEMBL workshop. Day two will consist of a series of talks from invited speakers, a few poster flash talks, a local nature walk, as well as celebratory cake. During the breaks, the poster session will be a great opportunity to catch up with other users and collaborators of the ChEMBL resources and chat to colleagues, co-workers and others to find out more about how the database is being used. Lunch and refreshments will be pro

ChEMBL 34 is out!

We are delighted to announce the release of ChEMBL 34, which includes a full update to drug and clinical candidate drug data. This version of the database, prepared on 28/03/2024 contains:         2,431,025 compounds (of which 2,409,270 have mol files)         3,106,257 compound records (non-unique compounds)         20,772,701 activities         1,644,390 assays         15,598 targets         89,892 documents Data can be downloaded from the ChEMBL FTP site:  https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_34/ Please see ChEMBL_34 release notes for full details of all changes in this release:  https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_34/chembl_34_release_notes.txt New Data Sources European Medicines Agency (src_id = 66): European Medicines Agency's data correspond to EMA drugs prior to 20 January 2023 (excluding vaccines). 71 out of the 882 newly added EMA drugs are only authorised by EMA, rather than from other regulatory bodies e.g.

RDKit, C++ and Jupyter Notebook

Fancy playing with RDKit C++ API without needing to set up a C++ project and compile it? But wait... isn't C++ a compiled programming language? How this can be even possible? Thanks to Cling (CERN's C++ interpreter) and xeus-cling jupyter kernel is possible to use C++ as an intepreted language inside a jupyter notebook! We prepared a simple notebook showing few examples of RDKit functionalities and a docker image in case you want to run it. With the single requirement of docker being installed in your computer you'll be able to easily run the examples following the three steps below: docker pull eloyfelix/rdkit_jupyter_cling docker run -d -p 9999:9999 eloyfelix/rdkit_jupyter_cling open  http://localhost:9999/notebooks/rdkit_cling.ipynb  in a browser