Skip to main content

Posts

Showing posts from 2023

The ChEMBL team wishes a Merry Xmas 2023!

2023 has been a productive year for the ChEMBL team, with two separate releases of ChEMBL, an overhaul of the ChEBI data schema, and the release of the new SureChEMBL . We introduced some new features in ChEMBL, including a flag for Natural Products and Chemical Probes and updated our Natural Product-likeness score . Drug data in ChEMBL and drug warning information has been updated for ChEMBL 32. You can find more detailed information in our 2023 NAR update paper . Another focus this year was the improvement of documentation and processes to make data depostion for ChEMBL easier. The curation of data on assay parameters is a constannt endeavour in the team and has recently been described in two separate blog posts: one on the " AIDX " and another one on the perfect assay description . We have also been active by developing our own Nature Trail event on Campus highlighting some of the bioactive compounds from the flora and fauna found on-site and elsewhere. Now it is t

In search of the perfect assay description

Credit: Science biotech, CC BY-SA 4.0 Assays des cribe the experimental set-up when testing the activity of drug-like compounds against biological targets; they provide useful context for researchers interested in drug-target relationships. Ver sion 33 of ChEMBL contains 1.6 million diverse assays spanning ADMET, physicochemical, binding, functional and toxicity experiments. A set of well-defined and structured assay descriptions would be valuable for the drug discovery community, particularly for text mining and NLP projects. These would also support ChEMBL's ongoing efforts towards an  in vitro  assay classification. This Blog post will consider the features of the 'perfect' assay description and provide a guide for depositors on the submission of high quality data. ChEMBL's assays are typically structured with the overall aim, target, and method .  The ideal assay description is succinct but contains all the necessary information for easy interpretation by database u

New SureChEMBL announcement

(Generated with DALL-E 3 ∙ 30 October 2023 at 1:48 pm) We have some very exciting news to report: the new SureChEMBL is now available! Hooray! What is SureChEMBL, you may ask. Good question! In our portfolio of chemical biology services, alongside our established database of bioactivity data for drug-like molecules ChEMBL , our dictionary of annotated small molecule entities ChEBI , and our compound cross-referencing system UniChem , we also deliver a database of annotated patents! Almost 10 years ago , EMBL-EBI acquired the SureChem system of chemically annotated patents and made this freely accessible in the public domain as SureChEMBL. Since then, our team has continued to maintain and deliver SureChEMBL. However, this has become increasingly challenging due to the complexities of the underlying codebase. We were awarded a Wellcome Trust grant in 2021 to completely overhaul SureChEMBL, with a new UI, backend infrastructure, and new f

Data Deposition made easy

Bioactivity data in ChEMBL originates from two main sources: biomedical literature and  data entering ChEMBL via data deposition. Over the years, we have seen an increase in deposited data sets (as shown in the Figure above). To make data deposition easier, we recently did some major updates to our deposition guide and supplemented it with a deposition video , and a brief checklist for depositors. Compound structural data must be supplied to the ChEMBL team in CTAB format / as MOL file. If a whole set of structures is supplied, then an SDF file is required. There are several ways to convert a list of SMILES to SDF. One way is to use our public ChEMBL Beaker API. Its functionalities are explained in more detail here and they can be tested interactively here . Just a few lines of code are needed to convert a list of SMILES strings into SD format. from chembl_webresource_client.utils import utils smiles = [ "CCO" , "CC(=O)C1=CC=CC=C1C(=O)O" , "CCN(CC)C(