Skip to main content

Posts

Showing posts from November, 2025

Adding Biomedical Annotation to SureChEMBL: Beyond the Chemical Space

        Dear users,   Since its introduction in 2015, SureChEMBL has been a database focused on chemical annotations. We extract compound structures from patent texts, images, and Molfiles when available, and register them in our database . This chemistry-first approach is even reflected in our name.   However, we know that intellectual property documents capture far more than chemistry. This was illustrated by Stefan Senger in 2017 ( 10.1186/s13321-017-0214-2 ), who showed that compound–target interactions can appear years before being mentioned in the scientific literature.   Our first step into biomedical annotation A few years ago, we took a first step beyond chemistry by adding annotations for genes/proteins, diseases, and mechanisms of action in the SureChEMBL UI. These were generated by an in-house Natural Language Processing (NLP) model that performed reasonably well for an initial version.   Example of biomedical annotation in a pa...