AI-driven bioassay annotation strategy The continued expansion of ChEMBL bioactivity data makes high-quality, structured assay metadata essential for reproducible analysis and machine-learning applications aligned with FAIR principles. Recent work by our team published in J. Cheminf. describes coordinated manual and AI-driven strategies to enhance the annotation, classification, and interoperability of ChEMBL bioassays. In this work, we have developed a spaCy-based named entity recognition (NER) model trained on manually curated assay descriptions to identify the Experimental Method within ChEMBL assay descriptions. The model achieved cross-validated precision, recall, and F1-scores of approximately 0.93, 0.95, and 0.94, respectively, and detected experimental methods in ~57 % of binding and functional assays in ChEMBL 35. Extracted method terms were subsequently mapped to the Bioassay Ontology (BAO) , demonstrating good precision at higher confidence thresholds bu...
We are excited to announce the start of the LIGAND-AI project, a 5-year project involving 18 partners to find ligands for thousands of understudied protein targets. This project, led by the SGC (Structural Genomics Consortium) and Pfizer, is part of the SGC Target 2035 strategy to discover chemical modulators for every human protein by the year 2035. There are press releases on the EMBL-EBI and SGC webpages with additional information. Let's begin with chemical probes. These are potent and selective small molecules used to pharmacologically modulate a protein's function. These are not necessarily the starting point of a drug discovery campaign (though they may be); what they allow is the study of the target and investigation of its function. The key point is that for many protein targets we do not have a chemical probe, nor do we know their function. Targets without a chemical probe tend to be understudied as the lack of a chemical probe rules out many studies, and it ca...