Skip to main content


Showing posts from 2018

Annotation of in vivo pharmacology assay data in ChEMBL

Check out our new article that provides a classification of in vivo pharmacology assay data in ChEMBL so that assays that investigate a similar disease or phenotype can be grouped: A substantial dataset of more than 135,000 in vivo assays has been collated as a key resource of animal models for translational medicine within drug discovery. To improve the utility of the in vivo data, an extensive data curation task has been undertaken that allows the assays to be grouped by animal disease model or phenotypic endpoint. The dataset contains previously unavailable information about compounds or drugs tested in animal models and, in conjunction with assay data on protein targets or cell- or tissue- based systems, allows the investigation of the effects of compounds at differing levels of biological complexity. Equally, it enables researchers to identify compounds that have been investigated for a group of disease-, pharmacology- or toxicity

Jobs in the ChEMBL Team

Looking for a change of job - come and join the ChEMBL team We currently have three vacancies for scientists wanting to help us develop the ChEMBL resources.  The positions are based at EMBL-EBI on the Wellcome Genome Campus near Cambridge, UK.  More details of these positions and how to apply can be found on the EMBL-EBI website . Scientific Data Engineer You will contribute to the development of robust, production data pipelines as well as prototyping novel scientific solutions. You will have excellent communication skills, able to interact with technical experts as well as scientists seeking solutions to their "real world" problems. Your role The job responsibilities will include: Responsibility for the handling, processing and integration of data into the ChEMBL database. Facilitating the deposition of datasets directly into ChEMBL by working closely with external collaborators. Applying text- & data-mining techniques for the development of effective

New ChEMBL Interface

We are pleased to announce that we have a beta version of a new ChEMBL interface that we would like you to try out.  It can be found here   There are also a lot of additional features in the new interface such as free text searching. You no longer need to specify that you want to search for a compound, target, document etc. Also as you type your search, there will be suggestions made for you. You can also filter of results to see just the subset of data you are interested in using a number of different filtering options. However, the new interface still retains some of the old features such as compound and target report cards.  More details on the new features can be found here  and we have also updated our FAQs.  But most of all we hope it is intuitive to use. It will replace the old interface soon but before we retire the old one we would like some feedback on the new one. We will continue to evolve it over the coming months so if you would like

Withdrawn Drugs

These is much ongoing work within the drug discovery and toxicology communities to better understand the safety aspects of approved drugs and clinical candidate compounds, and for this reason there is clear interest in why some drugs have been approved but then subsequently withdrawn from the market. This post describes the information for withdrawn drugs that is currently available in ChEMBL.   Within ChEMBL (release 24) there are 192 drugs that have been annotated as approved but then subsequently withdrawn from the market for one or more reasons.   For each of these drugs, the year of withdrawal, region of withdrawal and reason for withdrawal (‘withdrawn_reason’) have been available since release 22 of ChEMBL (see ), while the classification of the reason for withdrawal (‘withdrawn_class’)   is a new feature for ChEMBL (release 24).   The withdrawn information is availab

ChEMBL 24 Released!

We are pleased to announce the release of ChEMBL 24. This version of the database, prepared on 23/04/2018 contains:     2,275,906 compound records     1,828,820 compounds (of which 1,820,035 have mol files)     15,207,914 activities     1,060,283 assays     12,091 targets     69,861 documents Data can be downloaded from the ChEMBL ftp site: Please see ChEMBL_24 release notes for full details of all changes in this release: Change in data model and addition of activity properties and supplementary data: A new data submission format and database loader has been implemented. The new deposition system allows more advanced functionality, including the ability to update previously deposited data sets, and the ability to deposit activity data against existing ChEMBL compound or assay collections. This mean

Striving for Perfect Representation of Chemical Structures – is this possible?

It probably goes without saying that at ChEMBL, we have a desire to make all our data as accurate and useful as possible. With this in mind we have spent many hours over the last few years trying to curate, in particular, the structures of marketed drugs and clinical candidates. We aren’t alone in this and more than 5 years ago people were coming across the same problems as highlighted in this blog post by ChemConnector on Fluvastatin Our drug curation is an ongoing and probably a never-ending task but to be honest it has proved a lot more difficult than we expected. This is for several reasons: Firstly, where to go to find the definitive structure of a molecule? One would have thought this would be easy but even the sources such as INN and USAN don’t always agree. For example for Telavancin the USAN_data_sheet  shows a difference in the nitrogen and carbon counts in the structure images compared with the images in the INN document (although the molecular formula are the s

Schema changes coming in ChEMBL_24

Since ChEMBL was first released in 2009, the diversity of data sources and data types in the database has increased significantly. Increasingly, we are dealing with more complex assays such as measurement of drug pharmacokinetic parameters or toxicology data sets such as clinical biochemistry and tissue histopathology data. There are a number of problems handling these kinds of assays with the current data model/database schema. For example, since parameters such as compound doses or time points could not be recorded against individual activity measurements (only the whole assay) such experiments were typically split so that a separate assay was created for each compound or time point measured. This is obviously far from ideal. Another issue is that such experiments frequently measure or derive multiple endpoints from a particular assay (e.g., AUC, Cmax, tmax, t1/2 for a pharmacokinetic study) or produce large amounts of raw data that may need to be associated with summary-level

Join the ChEMBL Team!

We are looking for talented individuals to help us maintain and develop the ChEMBL and SureChEMBL resources and currently have a number of open positions within the team. If you are looking for an exciting new role and would like to work with us on the beautiful Wellcome Genome Campus , here are details of the positions: Data Integration Scientist We are looking for a Scientist with a passion for data integration to manage the incorporation of drug discovery data into the ChEMBL database. Responsibilities will include: Responsibility for the handling, processing and integration of data into the ChEMBL database. Facilitating the deposition of datasets directly into ChEMBL through working with external collaborators. Applying text- & data-mining techniques for the development of effective large-scale curation strategies. Developing methods for the application and maintenance of ontologies in ChEMBL. Working with other teams to facilitate the integration of

Have you heard of CORBEL?

Briefly, CORBEL is an initiative of thirteen biological and medical research infrastructures, which together create a platform for harmonised user access to biological and medical technologies, biological samples and data services required by cutting-edge biomedical research. Do you know that ChEMBL, through ELIXIR, participates to the project and provides its expertise in, among other things, identification of existing bioactivities for compounds of interest, profiling of chemotypes, target identification, data storage and distribution? But of course, CORBEL gives you access to different services working in many different biomedical areas. You want to screen the compounds you have identified and then use Electron Microscopy to observe their effect on a cell type of your interest, there are services for you! This is just an example of how CORBEL can contribute to boost your research projects(s), don’t forget we are   37 partners !   As part of the WP4, Commun