(Generated with DALL-E 3 ∙ 30 October 2023 at 1:48 pm)
We have some very exciting news to report: the new SureChEMBL is now available! Hooray!
What is SureChEMBL, you may ask. Good question! In our portfolio of chemical biology services, alongside our established database of bioactivity data for drug-like molecules ChEMBL, our dictionary of annotated small molecule entities ChEBI, and our compound cross-referencing system UniChem, we also deliver a database of annotated patents!
Almost 10 years ago, EMBL-EBI acquired the SureChem system of chemically annotated patents and made this freely accessible in the public domain as SureChEMBL. Since then, our team has continued to maintain and deliver SureChEMBL. However, this has become increasingly challenging due to the complexities of the underlying codebase. We were awarded a Wellcome Trust grant in 2021 to completely overhaul SureChEMBL, with a new UI, backend infrastructure, and new features. We are now able to make available the first outputs from this project, which addresses the first two of these deliverables, with more to come in the future!
See below a table comparing the 2 systems. In short, the new SureChEMBL has a more modern look-and-feel (very similar to our current ChEMBL interface) and is significantly faster. Most importantly for us, it is easier to support, and it should make it much easier to develop and deliver new functionalities. For instance, we are now able to provide a public API. We have also changed the way we are providing the biological annotations switching from a dictionary- to a machine learning-based approach.
Aspect |
Legacy SureChEMBL |
New SureChEMBL |
|
interface |
- outdated - compatibility issues with modern web browsers |
- revamped - modern - single page interface with unique search field |
|
architecture |
- monolithic |
- container-based - scalable |
|
chemical annotation |
- image to structure (3 methods) - text to structure (5 methods) - Mol files (certain US patents only) |
||
biological annotation |
- dictionary-based - generated on request - no download |
- machine learning-based - automatically generated - downloadable (available in a future release) |
|
responsiveness |
keyword search in all patent titles + open top patent |
1min8s |
17s |
patent number search + open patent |
1min15s |
7.5s |
|
public API |
no |
yes |
Keep in mind that this is just the beginning. In the coming months we will communicate on the changes regarding the new website, the system, the annotation pipeline, the ways for direct data downloads and much more.
For now, remember that this new system is in beta version so if you notice something is wrong or have suggestions, please contact our help-desk or open a ticket directly on github.
Don’t forget that you can register to our mailing list if you want to receive the latest SureChEMBL announcements
Final note: The old SureChEMBL will remain accessible for a short period of time before being permanently shut down. For technical reasons, its database won’t be updated with the most recent patents. It is accessible at this URL.
(the new SureChEMBL interface)
Comments