The ChEMBL-og

Posts

Showing posts from 2025

New position: NLP Data Scientist/Scientific Data Engineer

As part of a funded collaboration with Open Targets, there are two open positions as part of a team to develop a drug side effect resource: We are looking for two enthusiastic and talented NLP data scientists, cheminformaticians or bioinformaticians with experience in NLP and knowledge extraction to join the Open Targets Safety 2.0 project for a period of 3 years. You should enjoy delving into ways of addressing challenges in knowledge extraction and data standardisation, and want to contribute to open source code and resources. The project will develop a new side effect resource for drug discovery based on the extraction of side effect data from a range of documents. Your role will focus on developing data extraction pipelines using NLP models and implementing modern NLP methods and technologies suitable to the extraction of safety data. The position provides a real opportunity to make a significant impact on a critical problem in drug discovery for the many users of the Open Ta...

Open Position: Technical Lead Chemical Biology Services

We are looking for a new Technical Lead to lead the technical maintenance and development of our group's services . This is an exciting opportunity for someone either from a scientific or informatics background to play a key role in our team which has a huge impact on the scientific communities that we serve. Applicants may be from anywhere in the world (visa support and relocation allowance); also note that the quoted salary is tax free and so is equivalent to the net salary from another job. If you have any questions about the role, feel free to reach out to me at oboyle@ebi.ac.uk. For more information, and to apply, click HERE ! About the Team We are looking for a Technical Lead to join the Chemical Biology Resources team at the European Bioinformatics Institute (EMBL-EBI). The Chemical Biology Resources team provides world-leading chemogenomics resources to the scientific community. ChEMBL is a database of quantitative small-molecule bioactivity data curated primarily from ...

Adding Biomedical Annotation to SureChEMBL: Beyond the Chemical Space

Dear users, Since its introduction in 2015, SureChEMBL has been a database focused on chemical annotations. We extract compound structures from patent texts, images, and Molfiles when available, and register them in our database . This chemistry-first approach is even reflected in our name. However, we know that intellectual property documents capture far more than chemistry. This was illustrated by Stefan Senger in 2017 ( 10.1186/s13321-017-0214-2 ), who showed that compound–target interactions can appear years before being mentioned in the scientific literature. Our first step into biomedical annotation A few years ago, we took a first step beyond chemistry by adding annotations for genes/proteins, diseases, and mechanisms of action in the SureChEMBL UI. These were generated by an in-house Natural Language Processing (NLP) model that performed reasonably well for an initial version. Example of biomedical annotation in a pa...

ChEBI 2.0 is here!

ChEBI 2.0 is here! We’re excited to announce the release of ChEBI 2.0! The next generation of the Chemical Entities of Biological Interest database. If you missed the background story and the details of the redevelopment, you can catch up in our earlier blog post here . What’s Changed: The new ChEBI web interface is live at https://www.ebi.ac.uk/chebi/ Legacy SOAP web services are deprecated — please migrate to our REST APIs New ChEBI 2.0 data products (ontology, TSV, SDF, PostgreSQL dump) are available at https://ftp.ebi.ac.uk/pub/databases/chebi Submitter accounts have been migrated — you’ll need to reset your password before logging into the new submission portal Why ChEBI 2.0? The legacy ChEBI codebase had become increasingly difficult to maintain and update. ChEBI 2.0 has been rebuilt from the ground up with: Simplified PostgreSQL schema Faster search powered by Elasticsearch + RDKit Refreshed public user interface modern infrastructure (Kuberne...

ChEMBL Data Deposition Webinar

The Basics of ChEMBL Data Deposition Shortly, we are starting work on ChEMBL 37. Thus, we are running a 1 hour data depositor webinar , for anyone who is planning to deposit data to ChEMBL 37. If you are subscribed to the ChEMBL-depositors mailing list you will have heard about this already. If not, and you have an interest in depositing data to ChEMBL in future, we strongly reccommend you sign up to this depositor mailing list . This will give you updates on the submission process and deadlines. Webinar Info We are running on Friday 3 Oct , at 10:00 am , on Zoom . We will post the meeting link to the ChEMBL-depositors mailing list the day before. Learning Objectives of the Webinar This webinar is designed to: Familiarise new depositors with the basic requirements for ChEMBL input data. Re-familiarise former depositors with the deposition process, and explain any new requirements. A Reminder About ChEMBL 37 deadli...

Drug Data in ChEMBL: a critical asset

Breaking news 📢 We are excited to announce our new journal article that presents the comprehensive drug data in ChEMBL. The paper describes the state-of-the-art processes to curate and integrate the high-quality drug and clinical candidate drug data. The drug curation processes have been developed over more than 15 years and this is the first time that they have been published. https://pubs.acs.org/doi/10.1021/acs.jmedchem.5c00920 Published as a 'Perspectives' article in the Journal of Medicinal Chemistry, the paper educates ChEMBL users, helping them to understand the nature of the drug and clinical candidate data and the rationale that underlies curation decisions. G iven the increasing reliance on high-quality data in computational drug discovery, AI and machine learning, the integrated nature of the drug data within the ChEMBL bioactivity resource is a critical asset. This is a bumper week for drug data in ChEMBL! On Monday, our latest ChEMBL 36 release included ...

ChEMBL 36 is out!

📊 Database Scale 🆕 New Data Sources Src_ID 72 – Chemical Probe data from Scientific Literature NLP and manual extraction from probe-related publications. Includes phenotype/disease context , mapped to EFO ontology . Stored in assay_parameters . 🔄 Updated Data Sources AI-driven Structure-enabled Antiviral Platform (ASAP) Active Ingredient of a Prodrug BindingDB Patent Bioactivity Data British National Formulary (BNF) Clinical Candidate Compounds EMBL Heidelberg Gut Microbiome–host Interactions EUbOPEN Chemogenomic Library EUbOPEN Chemogenomic Library Literature Data European Medicines Agency (EMA) FDA Novel Drugs and Biotherapeutics FDA Orange Book Drugs International Nonproprietary Names (INN) for Pharmaceutical Substances SGC Frankfurt - Donated Chemical Probes Scientific Literature United States Adopted Names (USAN) WHO Anatomical Therapeutic Chemical (ATC) Classification of Drugs Withdrawn Drugs ___________________________________________________________________________...

Make the most of the ChEMBL interface with our newest webinar!

Before starting a new drug discovery project, it’s useful to compile existing data on compounds and targets of interest. Open access bioactivity databases such as ChEMBL support these efforts, offering structured and curated data in an easy-to-mine resource. To make the best use of ChEMBL, it’s important to understand the data types, structure and format. With this in mind, the ChEMBL team is providing a free “ Drug Design ” webinar! This webinar provides background on the ChEMBL database alongside worked examples highlighting some common scenarios in drug discovery initiatives. We’ve also included interface demos so that users can follow the examples and build upon these to extract data of interest to their project. We welcome feedback, questions, and suggestions and, as always, you can get in touch with us on our Helpdesk . In addition to this webinar, you can find our other training materials online, such as our deposition guide and introduction to the API, as well as an FAQ ...

SQLite versions now available for all ChEMBL releases

Last year, when I wanted to look at the evolution of ChEMBL over time I found it quite tricky . There were 34 releases at that point, but there was only one database format which was available for all 34 versions - MySQL. But downloading and installing all of these versions was painful as the contents of the .tar.gz are mostly but not exactly consistent, and the appropriate install command has changed a bit over the years. At the same time, since ChEMBL 19 we have provided SQLite versions of ChEMBL. This format is one of the few recommended formats for datasets as specified by the US Library of Congress (alongside JSON, CSV and XML). SQLite is an inherently simpler database format to deal with as it doesn't require the user to setup a server and import the database; rather we provide a .sqlite file which the user can use straightaway after unzipping the .tar.gz. A member of our community, Charles Tapley Hoyt, has gone further and built on this with the ChEMBL Downloader project...

ChEBI Web Services Retiring on 1st September

ChEBI Web Services Retiring on 1st September Over the last few years we’ve been rebuilding ChEBI from the ground up so it’s faster, easier to maintain, and ready for the next decade. If you missed the background story and what’s changed under the hood, our earlier post has the details. https://chembl.blogspot.com/2025/07/redevelopment-of-chebi.html TL;DR The new ChEBI web interface is live at https://www.ebi.ac.uk/chebi/beta/ Old SOAP web services will be retired on 01 September 2025 ; please move to our REST APIs ( https://www.ebi.ac.uk/chebi/backend/api/docs/ ) now. This is the new stable endpoint and will remain so for ChEBI 2.0. Old data product formats are being deprecated end of September 2025 ; new ChEBI 2.0 data products are already available at https://ftp.ebi.ac.uk/pub/databases/chebi-2 (moving to https://ftp.ebi.ac.uk/pub/databases/chebi after switch-over) Final switch-over: by end of September 2025 , the new ChEBI 2.0 interface will replace the current one at https...