Skip to main content

Posts

OPSIN v2.9.0 released

Just a quick note to say that Daniel Lowe has released OPSIN v.2.9.0 , the first release since Oct 2023. This is now available via the EMBL-EBI OPSIN server . The release notes describe a mixture of minor bug fixes and improvements: Support for IUPAC recommended primed number-letter locants e.g. 2''a Command-line output now includes warnings e.g. ambiguity SMILES writer now starts from a * atom if one is present Added numbering to nicotine Correctly interpretation of locanted perhalo terms and perhaloalkylalkanes Improved additive bond formation for phosphoryl Corrected locants on tolyl and assume p-tolyl if unspecified triazine is now interpreted as 1,3,5-triazine if unspecified Corrected interpretation of dithiazolium Fixed rare SMILES writing bug where slashes could be inconsistent Fixed ylidenethenylidene being parsed as [ylidene][thenylidene] instead of [yliden][ethenylidene] Fixed bug in spiro superscript inferring when a bridge is length 0 
Recent posts

OPSIN vs AI

I recently prepared a few slides on OPSIN for an internal presentation, and was looking for a simple use case. The first thing I tried turned out to be more interesting that I expected. If you visit the OPSIN website , there are three examples provided to illustrate its functionality. Daniel's original website, at the Uni of Cambridge, had 2,4,6-trinitrotoluene (TNT) as the example. With the move to EMBL-EBI and associated rewrite of the frontend, I thought about keeping this but decided that something more biologically-relevant would be appropriate. In the end, I comprised by keeping the 2,4,6- as a nod to the original, but used a saccharide instead: 2,4,6-tri-O-methyl-D-glucopyranose. Now click on "Search Google", to do a search using the InChIKey. My attention was drawn to the AI summary results, which I captured at the time (maybe you can tell when?) in the screenshot below: "The string  UTLUVTKMAWSZKV-NEIVSKJXSA-N is an InChIKey (International Chemical Identifie...

Second announcement of 2nd ChEMBL User Group Meeting

This is a reminder that the 2nd ChEMBL User Group Meeting will take place on June 10-11 on the Wellcome Genome Campus, Hinxton, near Cambridge, UK. This event is dedicated to building and supporting the ChEMBL and SureChEMBL user communities. This is a two day event; while hybrid attendance on Day 1 is possible, we really encourage in-person participation to allow you to meet the team, present your work, network, and to take part in Day 2 setting the scene for the future direction of the group. The deadline for speaker registration is two weeks from now, on 18th March so register now . We hope to see you there.

Recording: SureChEMBL2.0 - Now with Added Disease and Protein Annotations

  A week ago, I had the pleasure of presenting SureChEMBL2.0 at the  Cambridge Cheminformatics Network Meeting , organised by Andreas Bender and kindly hosted by the  Cambridge Crystallographic Data Centre . It was a great opportunity to introduce one of the latest freely available databases of scientifically annotated patents to a broad scientific audience. The recording of the talk is now available  online , along with the  slides . What did I cover during this 30-minute talk? Why scientists should pay attention to patent data Why patents are challenging to work with What SureChEMBL is and what it does How we identify chemical compounds in patent documents What SureChEMBL 2.0 has recently introduced How we annotate patents for genes/proteins and diseases How we are improving the quality of structures extracted from images What you can download from the SureChEMBL core datasets — and what they contain Examples of queries that SureChEMBL h...

AI-driven Annotation and FAIRification of ChEMBL Bioassays

  AI-driven bioassay annotation strategy The continued expansion of ChEMBL bioactivity data makes high-quality, structured assay metadata essential for reproducible analysis and machine-learning applications aligned with FAIR principles. Recent work by our team  published in J. Cheminf. describes coordinated manual and AI-driven strategies to enhance the annotation, classification, and interoperability of ChEMBL bioassays.  In this work, we have developed a spaCy-based named entity recognition (NER) model trained on manually curated assay descriptions to identify the Experimental Method within ChEMBL assay descriptions. The model achieved cross-validated precision, recall, and F1-scores of approximately 0.93, 0.95, and 0.94, respectively, and detected experimental methods in ~57 % of binding and functional assays in ChEMBL 35. Extracted method terms were subsequently mapped to the Bioassay Ontology (BAO) , demonstrating good precision at higher confidence thresholds bu...

Supporting LIGAND-AI's search for ligands for understudied targets

We are excited to announce the start of the LIGAND-AI project, a 5-year project involving 18 partners to find ligands for thousands of understudied protein targets. This project, led by the SGC (Structural Genomics Consortium) and Pfizer, is part of the SGC Target 2035 strategy to discover chemical modulators for every human protein by the year 2035. There are press releases on the EMBL-EBI and SGC webpages with additional information. Let's begin with chemical probes. These are potent and selective small molecules used to pharmacologically modulate a protein's function. These are not necessarily the starting point of a drug discovery campaign (though they may be); what they allow is the study of the target and investigation of its function. The key point is that for many protein targets we do not have a chemical probe, nor do we know their function. Targets without a chemical probe tend to be understudied as the lack of a chemical probe rules out many studies, and it ca...

Exploring Targeted Protein Degradation Data in ChEMBL

    Drug modalities continue to evolve and the capture of data from recent literature ensures that ChEMBL remains up-to-date with recent developments. Drugs and clinical candidates in ChEMBL undergo enhanced curation , including annotation of their drug mechanisms (the therapeutic target with a role in disease efficacy and the action type of the drug against the entity, e.g. INHIBITOR). Classical small-molecule modulators typically bind to the active site of a target, leading to either a loss or, in some cases, an increase in protein function. More recently, emerging strategies such as Targeted Protein Degradation (TPD) have gained traction and instead remove (degrade) the target. The growing body of TPD data in ChEMBL will be discussed in this blog post.    Targeted Protein Degraders ( PROTACs ) from the ChEMBL database. To explore TPD data in ChEMBL, we’ve used a combination of TPD-relevant keywords and effector protein accessions (UniProt). If you’re interested in...