The ChEMBL-og

Posts

Withdrawn Drugs

These is much ongoing work within the drug discovery and toxicology communities to better understand the safety aspects of approved drugs and clinical candidate compounds, and for this reason there is clear interest in why some drugs have been approved but then subsequently withdrawn from the market. This post describes the information for withdrawn drugs that is currently available in ChEMBL. Within ChEMBL (release 24) there are 192 drugs that have been annotated as approved but then subsequently withdrawn from the market for one or more reasons. For each of these drugs, the year of withdrawal, region of withdrawal and reason for withdrawal (‘withdrawn_reason’) have been available since release 22 of ChEMBL (see ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_22/archived/chembl_22_release_notes.txt ), while the classification of the reason for withdrawal (‘withdrawn_class’) is a new feature for ChEMBL (release 24). The withdrawn inf...

ChEMBL 24 Released!

We are pleased to announce the release of ChEMBL 24. This version of the database, prepared on 23/04/2018 contains: 2,275,906 compound records 1,828,820 compounds (of which 1,820,035 have mol files) 15,207,914 activities 1,060,283 assays 12,091 targets 69,861 documents Data can be downloaded from the ChEMBL ftp site: ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_24_1 Please see ChEMBL_24 release notes for full details of all changes in this release: ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_24_1/chembl_24_1_release_notes.txt Change in data model and addition of activity properties and supplementary data: A new data submission format and database loader has been implemented. The new deposition system allows more advanced functionality, including the ability to update previously deposited data sets, and the ability to deposit activity data again...

Striving for Perfect Representation of Chemical Structures – is this possible?

It probably goes without saying that at ChEMBL, we have a desire to make all our data as accurate and useful as possible. With this in mind we have spent many hours over the last few years trying to curate, in particular, the structures of marketed drugs and clinical candidates. We aren’t alone in this and more than 5 years ago people were coming across the same problems as highlighted in this blog post by ChemConnector on Fluvastatin Our drug curation is an ongoing and probably a never-ending task but to be honest it has proved a lot more difficult than we expected. This is for several reasons: Firstly, where to go to find the definitive structure of a molecule? One would have thought this would be easy but even the sources such as INN and USAN don’t always agree. For example for Telavancin the USAN_data_sheet shows a difference in the nitrogen and carbon counts in the structure images compared with the images in the INN document (although the molecular formula are t...

Schema changes coming in ChEMBL_24

Since ChEMBL was first released in 2009, the diversity of data sources and data types in the database has increased significantly. Increasingly, we are dealing with more complex assays such as measurement of drug pharmacokinetic parameters or toxicology data sets such as clinical biochemistry and tissue histopathology data. There are a number of problems handling these kinds of assays with the current data model/database schema. For example, since parameters such as compound doses or time points could not be recorded against individual activity measurements (only the whole assay) such experiments were typically split so that a separate assay was created for each compound or time point measured. This is obviously far from ideal. Another issue is that such experiments frequently measure or derive multiple endpoints from a particular assay (e.g., AUC, Cmax, tmax, t1/2 for a pharmacokinetic study) or produce large amounts of raw data that may need to be associated with summary-level ...

Join the ChEMBL Team!

We are looking for talented individuals to help us maintain and develop the ChEMBL and SureChEMBL resources and currently have a number of open positions within the team. If you are looking for an exciting new role and would like to work with us on the beautiful Wellcome Genome Campus , here are details of the positions: Data Integration Scientist We are looking for a Scientist with a passion for data integration to manage the incorporation of drug discovery data into the ChEMBL database. Responsibilities will include: Responsibility for the handling, processing and integration of data into the ChEMBL database. Facilitating the deposition of datasets directly into ChEMBL through working with external collaborators. Applying text- & data-mining techniques for the development of effective large-scale curation strategies. Developing methods for the application and maintenance of ontologies in ChEMBL. Working with other teams to facilitate the integration of...

Have you heard of CORBEL?

Briefly, CORBEL is an initiative of thirteen biological and medical research infrastructures, which together create a platform for harmonised user access to biological and medical technologies, biological samples and data services required by cutting-edge biomedical research. Do you know that ChEMBL, through ELIXIR, participates to the project and provides its expertise in, among other things, identification of existing bioactivities for compounds of interest, profiling of chemotypes, target identification, data storage and distribution? But of course, CORBEL gives you access to different services working in many different biomedical areas. You want to screen the compounds you have identified and then use Electron Microscopy to observe their effect on a cell type of your interest, there are services for you! This is just an example of how CORBEL can contribute to boost your research projects(s), don’t forget we are 37 partners ! As part of the WP...

ChEMBL tissues: Increasing depth, breadth and accuracy of annotations

Our current tissue annotation efforts have been on increasing the breadth and depth of the tissue effort first started in ChEMBL 22. The figure above represents the increased depth and coverage from that initial point till now. We continue to use a suite of tissue ontologies namely: Uberon, Experimental Factor Ontology ( http://www.ebi.ac.uk/ols/ontologies/efo ) , CALOHA (ftp://ftp.nextprot.org/pub/current_release/controlled_vocabularies/caloha.obo) and Brenda Tissue Ontology ( ( http://www.ebi.ac.uk/ols/ontologies/bto ) to identify assays where the tissue is the assay system. We have increased the detail of information we capture to reflect the more granular tissues mentioned in the assays such as 'Popliteal lymph node' and 'Substantia nigra' pars compacta where previously the higher level term ‘lymph node’ and ‘Substantia nigra’ might have been captured. Plasma based assays We have recently focused annotation efforts on plasma bas...