Skip to main content

New ChEMBL Web Services




Following on from our recent ChEMBL 20 release we are pleased to announce an updated version of the ChEMBL web services. First things first, some of the most important bits of information:
  • You can explore new resources using online documentation available here: https://www.ebi.ac.uk/chembl/api/data/docs
  • The code is Apache 2.0 licensed and available on GitHub: https://github.com/chembl/chembl_webservices_2
  • The basic URL for accessing web services is https://www.ebi.ac.uk/chembl/api/data/ so in order to retrieve some information about molecules, you would construct following URL: https://www.ebi.ac.uk/chembl/api/data/molecule
  • The 'old' web services are still available, so any applications and code relying on these services will continue to work. We do encourage you to review and migrate any code which uses these services to the updated versions described in this blog post.

 

And now some short Q&A session:

 

Q: But you already have web services available here: https://www.ebi.ac.uk/chemblws/ and they are sooo great, so why release something new?

A: With each new ChEMBL release more and more tables are being added to the underlying database schema, which have not been reflected in the web services for a long time. Additionally we had many suggestions from our users, which we have tried to address where possible. So in order expose more data and provide new functionality, it made sense to release the new and improved ChEMBL web services. The table below summaries the most important differences between old and updated ChEMBL web services: 

Feature
Original ChEMBL Web Services
Updated ChEMBL Web Services
Base URL
https://www.ebi.ac.uk/chemblws
https://www.ebi.ac.uk/chembl/api/data
Number of resources
5
17
Pagination
No
Yes
Filtering
No
Yes
Ordering
No
Yes
Raster Images
Yes
Yes
Vector Images
No
Yes
JSONP support
Yes
Yes
CORS support
Yes
Yes
Online Documentation
Python client library
Yes
Yes
Available REST verbs
GET, POST
GET, POST
Support for SMILES in GET
Partial
Full

As you see, the most important difference are the number of new resources, for example we now  include 'activity', 'cell_line', 'document' and many more. Existing resources (e.g. molecule, target, assay) have also been updated to include many new important attributes.

Another useful feature is filtering, you can now for example retrieve all molecules with a molecule weight less than 300 Da and preferred name ending with 'nib' using following URL:
https://www.ebi.ac.uk/chembl/api/data/molecule?molecule_properties__mw_freebase__lte=300&pref_name__iendswith=nib. You can apply filtering to every resource and filter by most of the fields belonging to given resource so for example targets that contain 'kinase' in pref_name would be: https://www.ebi.ac.uk/chembl/api/data/target?pref_name__contains=kinase. Don't worry if you do not follow the filtering query language, there will be a separate blog post on this topic.

Another new (and much requested) feature is the ability to easily retrieve multiple records from a resource. There are currently many ways to do this (filtering is one of them), but a more intuitive way is to get multiple records in bulk via the list of IDs. This is now possible, for example to retrieve ChEMBL molecules CHEMBL900, CHEMBL1096643 and CHEMBL1490, you can use: https://www.ebi.ac.uk/chembl/api/data/molecule/set/CHEMBL900;CHEMBL1096643;CHEMBL1490.

Q: Are the updated ChEMBL web services compatible with the old ones?
A: No. The URLs patterns and responses returned by the web services have changed, so they are not compatible. We recommend you review any code which currently consumes the old services and look to migrated to the updated version described here.

Q: But I have a lot of code that depends on the old web services! Will it break?
A: No. The old web services are still accessible and have been updated to serve data from ChEMBL 20 release. We will continue to support the old web services, but as part of our deprecation plan, we probably aim to stop supporting them by the end of the year. Please remember that the old code will always be available on GitHub and PyPI, so it is possible to create your own instance.

Q: How can I retrieve large data sets in the updated ChEMBL web services?
A: You can now retrieve all entities of a given resource, for example https://www.ebi.ac.uk/chembl/api/data/molecule returns all molecules in ChEMBL. This is achieved by returning the data in 'pages'. In order to implement paging, data is now wrapped in an envelope(<response> tag for XML output and outermost dictionary for JSON). Inside the envelope you can find collection of objects ('molecules' for molecule resource) and meta information looking like this:



Meta data provides information about current page: limit (how many objects can be on the page) and offset (serial number of the first object), as well as links to the previous and next page and total object count. Using the meta data it is now possible to loop through the large result set.

Q: Speaking about your python client library: will it include the old or the new version? Will it be backwards compatible?
A: Yes, version  0.8.x of the chembl_webresource_client package includes support for the updated web services. In order to access them, use following line of code:


If you list attributes of new_client object you will find that it contains all resources. The usage will be covered in separate blog post. We plan to drop support for the old services in version 0.9.x of the library, so the new_client will become default client.

Q: You claim that new web services supports GET and POST methods but online documentation lists only GET endpoints. I actually tried POST and I'm getting 405 Method now allowed error.
A: In the REST context GET, POST, PUT and DELETE are verbs meaning RETRIEVE, CREATE, UPDATE and DELETE accordingly. Because our web services are read-only we should only allow GET method. But GET has one very big limitation: all parameters (like identifiers, filtering, paging, ordering info) are appended to URL. All web servers (and some browsers) impose limits on URL length. In Apache, this limitation is 4000 characters by default. This means that if you want to get data about some (very) big molecule using SMILES as the identifier, it may happen that the resulting URL will be too long.

On the other hand, POST doesn't have this limitation as parameters are included in request body, not in URL. This is why we allow POST to retrieve data. But to be consistent with REST standard you have to let us now: 'OK, in this request I want to GET some data but I'm using POST just to pass more parameters'. In order to use it, just add header X-HTTP-Method-Override:GET to your POST request and you won't see 405 error anymore.

We hope you find the update useful and if you experience any problems or have any questions please get in touch.

The ChEMBL Team

Comments

Popular posts from this blog

A python client for accessing ChEMBL web services

Motivation The CheMBL Web Services provide simple reliable programmatic access to the data stored in ChEMBL database. RESTful API approaches are quite easy to master in most languages but still require writing a few lines of code. Additionally, it can be a challenging task to write a nontrivial application using REST without any examples. These factors were the motivation for us to write a small client library for accessing web services from Python. Why Python? We choose this language because Python has become extremely popular (and still growing in use) in scientific applications; there are several Open Source chemical toolkits available in this language, and so the wealth of ChEMBL resources and functionality of those toolkits can be easily combined. Moreover, Python is a very web-friendly language and we wanted to show how easy complex resource acquisition can be expressed in Python. Reinventing the wheel? There are already some libraries providing access to ChEMBL d

ChEMBL 29 Released

  We are pleased to announce the release of ChEMBL 29. This version of the database, prepared on 01/07/2021 contains: 2,703,543 compound records 2,105,464 compounds (of which 2,084,724 have mol files) 18,635,916 activities 1,383,553 assays 14,554 targets 81,544 documents Data can be downloaded from the ChEMBL FTP site:   https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_29 .  Please see ChEMBL_29 release notes for full details of all changes in this release: https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_29/chembl_29_release_notes.txt New Deposited Datasets EUbOPEN Chemogenomic Library (src_id = 55, ChEMBL Document IDs CHEMBL4649982-CHEMBL4649998): The EUbOPEN consortium is an Innovative Medicines Initiative (IMI) funded project to enable and unlock biology in the open. The aims of the project are to assemble an open access chemogenomic library comprising about 5,000 well annotated compounds covering roughly 1,000 different proteins, to synthesiz

Identifying relevant compounds in patents

  As you may know, patents can be inherently noisy documents which can make it challenging to extract drug discovery information from them, such as the key targets or compounds being claimed. There are many reasons for this, ranging from deliberate obfuscation through to the long and detailed nature of the documents. For example, a typical small molecule patent may contain extensive background information relating to the target biology and disease area, chemical synthesis information, biological assay protocols and pharmacological measurements (which may refer to endogenous substances, existing therapies, reaction intermediates, reagents and reference compounds), in addition to description of the claimed compounds themselves.  The SureChEMBL system extracts this chemical information from patent documents through recognition of chemical names, conversion of images and extraction of attached files, and allows patents to be searched for chemical structures of interest. However, the curren

Julia meets RDKit

Julia is a young programming language that is getting some traction in the scientific community. It is a dynamically typed, memory safe and high performance JIT compiled language that was designed to replace languages such as Matlab, R and Python. We've been keeping an an eye on it for a while but we were missing something... yes, RDKit! Fortunately, Greg very recently added the MinimalLib CFFI interface to the RDKit repertoire. This is nothing else than a C API that makes it very easy to call RDKit from almost any programming language. More information about the MinimalLib is available directly from the source . The existence of this MinimalLib CFFI interface meant that we no longer had an excuse to not give it a go! First, we added a BinaryBuilder recipe for building RDKit's MinimalLib into Julia's Yggdrasil repository (thanks Mosè for reviewing!). The recipe builds and automatically uploads the library to Julia's general package registry. The build currently targe

New Drug Warnings Browser

As mentioned in the announcement post of  ChEMBL 29 , a new Drug Warnings Browser has been created. This is an updated version of the entity browsers in ChEMBL ( Compounds , Targets , Activities , etc). It contains new features that will be tried out with the Drug Warnings and will be applied to the other entities gradually. The new features of the Drug Warnings Browser are described below. More visible buttons to link to other entities This functionality is already available in the old entity browsers, but the button to use it is not easily recognised. In the new version, the buttons are more visible. By using those buttons, users can see the related activities, compounds, drugs, mechanisms of action and drug indications to the drug warnings selected. The page will take users to the corresponding entity browser with the items related to the ones selected, or to all the items in the dataset if the user didn’t select any. Additionally, the process of creating the join query is no