The ChEMBL-og

Posts

Showing posts from 2020

Sequence similarity searches in ChEMBL

The ChEMBL database contains bioactivity data that links compounds to their biological targets. Most ChEMBL targets are proteins (~ 70% in version 27) and these are mapped to their UniProt accessions. On the ChEMBL interface, searches can be performed with either protein names or accessions...but did you know that protein similarity searches are also possible? Here’s an example using human Phospholipase DDHD2 , a target not found in ChEMBL. 1. On the ChEMBL interface , click 'Enter a Sequence: 2. Input the FASTA sequence corresponding to human Phospholipase DDHD2 and click 'Search in ChEMBL': 3. Review the BLAST results, select targets of interest and browse bioactivity data: The BLAST search identifies the mouse Phospholipase DDHD2 homologue alongside a small number of bioactivity data points and active compounds . ChEMBL's sequence search feature is currently only available through the interface. However, sequence data for prote

Data checks

ChEMBL contains a broad range of binding, functional and ADMET type assays in formats ranging from in vitro single protein assays to anti proliferative cell-based assays. Some variation is expected, even for very similar assays, since these are often performed by different groups and institutes. ChEMBL includes references for all bioactivity values so that full assay details can be reviewed if needed, however there are a number of other data checks that can be used to identify potentially problematic results. 1) Data validity comments: The data validity column was first included in ChEMBL v15 and flags activities with potential validity issues such as a non-standard unit for type or activities outside of the expected range. Users can review flagged activities and decide how these should be handled. The data validity column can be viewed on the interface (click 'Show/Hide columns' and select 'data validity comments') and can be found in the activities table in the

Molecule hierarchy

During drug development, active pharmaceutical ingredients are often formulated as salts to provide the final pharmaceutical product. ChEMBL includes parent molecules and their salts (approved and investigational) as well as other alternative forms such as hydrates and radioisotopes. These alternative forms are linked to their parent compound through the molecule hierarchy. Using the molecule hierarchy The molecule hierarchy can be used to retrieve and display connected compounds and to aggregate activity data that has been mapped to any member of a compound family. On the interface, related compounds are automatically displayed in the ‘Alternative forms’ section of the ChEMBL compound report card. Bioactivity data can easily be aggregated in the activity summary by using the 'Include/Exclude Alternative Forms' filter. Finding the molecule hierarchy On the interface, we include alternative forms as shown above. The downloaded database contains the molecule_hierarchy table

Using ChEMBL activity comments

We’re sometimes asked what the ‘activity_comments’ in the ChEMBL database mean. In this Blog post, we’ll use aspirin as an example to explain some of the more common activity comments. First, let’s review the bioactivity data included in ChEMBL. We extract bioactivity data directly from seven core medicinal chemistry journals . Some common activity types, such as IC50s, are standardised to allow broad comparisons across assays; the standardised data can be found in the standard_value , standard_relation and standard_units fields. Original data is retained in the database downloads in the value , relation and units fields. However, we extract all data from a publication including non-numerical bioactivity and ADME data. In these cases, the activity comments may be populated during the ChEMBL extraction-curation process in order to capture the author's overall conclusions . Similarly, for deposited datasets and subsets of other databases (e.g. DrugMatrix, PubChem), th

FPSim2 v0.2.0

FPSim2 is the fast Python3 similarity search tool we are currently using in the ChEMBL interface. It's been a while since we first (and last) posted about it so we thought it deserved an update. We've just released a new version ( v0.2.0 ) and the highlights since we first talked about it are: CPU intensive functions moved from Cython to C++ with Pybind11 . Now also using libpopcnt Improved speed, specially when dealing with some edge cases Conda builds avaiable for Windows, Mac and Linux. There is no Conda for ARM but it also compiles and works in a Raspberry Pi! (and probably will do with the new ARM Macs as well) Tversky search with a and b parameters (it previously had the 'substructure' feature with a and b respectively fixed to 1 and 0) Distance matrix calculation of the whole set feature is now available Zenodo DOI also available: 10.5281/zenodo.3902922 From a user point of view, the most interesting new feature is probably the distance matrix calculation. Af

Malaria inhibitor prediction platform

What a time! For most of us, this is the first time that we have experienced a pandemic and its impact on our daily life. Although working from home has become our routine in the ChEMBL group, we are still working as hard as ever! Of course, COVID-19 data is taking up some of our attention, (see ChEMBL_27 ) but we are also continuing our work relevant to other diseases that affect large populations around the world. Today, we are going to talk about malaria. As you may know, this disease of the Plasmodium parasite family threatens nearly half of the world’s population and led to over 400,000 deaths in 2019, predominantly among children in resource-limited areas in Africa, Asia and Central and South America. New therapies are desperately needed, in particular to cope with increased resistance against artemisinin-based combination therapies. To help address this challenge, we have been involved in a public-private consortium that aims to deliver a tool to predict pot

ChEMBL_27 SARS-CoV-2 release

The COVID-19 pandemic has resulted in an unprecedented effort across the global scientific community. Drug discovery groups are contributing in several ways, including the screening of compounds to identify those with potential anti-SARS-CoV-2 activity. When the compounds being assayed are marketed drugs or compounds in clinical development then this may identify potential repurposing opportunities (though there are many other factors to consider including safety and PK/PD considerations; see for example https://www.medrxiv.org/content/10.1101/2020.04.16.20068379v1.full.pdf+html ). The results from such compound screening can also help inform and drive our understanding of the complex interplay between virus and host at different stages of infection. Several large-scale drug screening studies have now been described and made available as pre-prints or as peer-reviewed publications. The ChEMBL team has been following these developments with significant interest, and as a contr