AI-driven bioassay annotation strategy The continued expansion of ChEMBL bioactivity data makes high-quality, structured assay metadata essential for reproducible analysis and machine-learning applications aligned with FAIR principles. Recent work by our team published in J. Cheminf. describes coordinated manual and AI-driven strategies to enhance the annotation, classification, and interoperability of ChEMBL bioassays. In this work, we have developed a spaCy-based named entity recognition (NER) model trained on manually curated assay descriptions to identify the Experimental Method within ChEMBL assay descriptions. The model achieved cross-validated precision, recall, and F1-scores of approximately 0.93, 0.95, and 0.94, respectively, and detected experimental methods in ~57 % of binding and functional assays in ChEMBL 35. Extracted method terms were subsequently mapped to the Bioassay Ontology (BAO) , demonstrating good precision at higher confidence thresholds bu...
The Organization of Drug Discovery Data
| | | | | | | | |