The ChEMBL-og

Posts

Happy Holidays!

photo by B. Zdrazil, Wellcome Genome Campus in morning glory, Dec. 2025 The Chemical Biology Services Team is looking back at a year full of changes, new beginnings, and the farewell of a few long-standing team members. In February, we welcomed our new Team Leader Noel O'Boyle . He brought fresh perspectives and new energy to the team, but he also had to see several team members leave as they reached the end of their contracts. We are grateful for all the contributions that Eloy Felix, Fiona Hunter, David Mendez, Juan Mosquera, A. Lina Heinzke, Cote Falguera, Melanie Schneider, and Sybilla Corbett have made to our great resources: ChEMBL, SureChEMBL, ChEBI & UniChem. 2025 has been a very productive year! OPSIN found a new home under the EBI umbrella, ChEMBL 36 - the largest data release for ChEMBL ever - was released in autumn, ChEBI 2.0 was launched and its restructuring has been published in a recent NAR article . The team also made some great advances to push ...

Restructuring of ACTIVITY data: VALUE, TEXT_VALUE and ACTIVITY_COMMENT

ChEMBL is a bioactivity database for drug-like compounds. The ACTIVITIES table stores the readout from bioassays that test compounds against a biological target. These are often numerical readouts such as Ki, IC50, Inhibition %, or Cmax but are occasionally non-numerical summary data, e.g., “Not Soluble”, “Not active”, or “Active”. Historically, the non-numerical data was captured as an ACTIVITY_COMMENT but since ChEMBL 24 this has been more accurately captured as TEXT_VALUE. Whilst the TEXT_VALUE field has been used more extensively in recent releases, legacy data covering experimental outcomes, observations, context such as threshold values, and other metadata is still largely hosted in the ACTIVITY_COMMENT field. Moving forward, only the TEXT_VALUE field will be used to report the primary outcome of an experiment where the output is not numerical, for example categorical data. This could be reporting an Activity (e.g., “Active”/ “ Not active”), Toxicity (e.g., “Toxic”/ “ ...

Streamlining the Pesticide Mechanism of Action Classification

Since ChEMBL 20, molecules in ChEMBL that are known pesticides have been linked to the mechanism of action classification assigned by the Fungicide Resistance Action Committee (FRAC: http://www.frac.info), Herbicide Resistance Action Committee (HRAC: http://www.hracglobal.com) or Insecticide Resistance Action Committee (IRAC: http://www.irac-online.org). These Committees provide information on the mechanism of action of key pesticides as part of ongoing efforts to combat pesticide resistant fungi, plants, and insects, and to prolong the use of existing pesticides. Supplementing ChEMBL with curated pesticide data from key organisations complements the wealth of compounds within ChEMBL explored for applications in human health, supporting a move towards a OneHealth approach. These classification schemes group pesticides both by their mode of action and chemical class. From ChEMBL 20 to ChEMBL 35 classifications were stored within ChEMBL in three tables, with three associated mapping tabl...

New position: NLP Data Scientist/Scientific Data Engineer

As part of a funded collaboration with Open Targets, there are two open positions as part of a team to develop a drug side effect resource: We are looking for two enthusiastic and talented NLP data scientists, cheminformaticians or bioinformaticians with experience in NLP and knowledge extraction to join the Open Targets Safety 2.0 project for a period of 3 years. You should enjoy delving into ways of addressing challenges in knowledge extraction and data standardisation, and want to contribute to open source code and resources. The project will develop a new side effect resource for drug discovery based on the extraction of side effect data from a range of documents. Your role will focus on developing data extraction pipelines using NLP models and implementing modern NLP methods and technologies suitable to the extraction of safety data. The position provides a real opportunity to make a significant impact on a critical problem in drug discovery for the many users of the Open Ta...

Open Position: Technical Lead Chemical Biology Services

We are looking for a new Technical Lead to lead the technical maintenance and development of our group's services . This is an exciting opportunity for someone either from a scientific or informatics background to play a key role in our team which has a huge impact on the scientific communities that we serve. Applicants may be from anywhere in the world (visa support and relocation allowance); also note that the quoted salary is tax free and so is equivalent to the net salary from another job. If you have any questions about the role, feel free to reach out to me at oboyle@ebi.ac.uk. For more information, and to apply, click HERE ! About the Team We are looking for a Technical Lead to join the Chemical Biology Resources team at the European Bioinformatics Institute (EMBL-EBI). The Chemical Biology Resources team provides world-leading chemogenomics resources to the scientific community. ChEMBL is a database of quantitative small-molecule bioactivity data curated primarily from ...

Adding Biomedical Annotation to SureChEMBL: Beyond the Chemical Space

Dear users, Since its introduction in 2015, SureChEMBL has been a database focused on chemical annotations. We extract compound structures from patent texts, images, and Molfiles when available, and register them in our database . This chemistry-first approach is even reflected in our name. However, we know that intellectual property documents capture far more than chemistry. This was illustrated by Stefan Senger in 2017 ( 10.1186/s13321-017-0214-2 ), who showed that compound–target interactions can appear years before being mentioned in the scientific literature. Our first step into biomedical annotation A few years ago, we took a first step beyond chemistry by adding annotations for genes/proteins, diseases, and mechanisms of action in the SureChEMBL UI. These were generated by an in-house Natural Language Processing (NLP) model that performed reasonably well for an initial version. Example of biomedical annotation in a pa...