ChEMBL Resources


Thursday, 27 May 2010

ChEMBL_04 downloads now available

We are pleased to announce the release of the latest version of the ChEMBL database: chembl_04

In addition to the inclusion of new data from the primary scientific literature, three neglected-disease datasets have been deposited in the database: Plasmodium falciparum screening data from GSK, Novartis/GNF and St Jude Children's Research Hospital. These datasets can be identified on the basis of their src_id in the compound_records and assays tables. For more information, or to download the full deposited datasets, please visit the new ChEMBL-NTD website:

You can access the data via the ChEMBL database interface:

Alternatively you can download the ChEMBL database (Oracle 9i, 10g, 11g, or MySQL) from our ftp site:

For details of upcoming webinars, please see:

Wednesday, 26 May 2010

4th Annual Forum for SMEs - October 18th-19th, Munich

The 4th Annual Forum for SMEs is being held in Munich on October the 18th-19th 2010. The meeting has the aim of showcasing the facilities and support offered by EMBL-EBI for Small and Medium Enterprises (SMEs). The meeting also has a series of sessions from the European Patent Office (EPO).

Further details can be found on the ENFIN website

ChEMBL-NTD Interface Walkthrough - Thursday 24th June, 3pm BST

We are running a brief web-based walkthrough of the ChEMBL-NTD deposited data and interface on Thursday 24th June 2010 at 3pm BST. If you are interested in receiving links to the meeting and a telephone number to call, please mail us. (Please, please, please use this link to mail, otherwise things easily get lost).

Secure Searches now available for ChEMBLdb, ChEMBL-NTD and Kinase SARfari

You can now access ChEMBLdb, ChEMBL-NTD and Kinase SARfari with the https protocol. This provides encrypted traffic between your web browser and also validates the identity of our server (so you know you are using the real ChEMBL site at the EMBL-EBI and not an imposter). Usage of all ChEMBL resources are also subject the the standard EMBL Terms of Use

You may wish to update any bookmarks you have saved to use these new urls.

The urls are: for ChEMBLdb

and for ChEMBL-NTD
and for Kinase SARfari

The standard http addresses will continue to work.

Sunday, 23 May 2010

Part-time, Home-Based Curator - Biological Drugs

We are looking to appoint a part-time, home-based curator to cover biological drugs for the Open Access, Open Data drug discovery database ChEMBL. The role will require some previous experience of biological drug discovery and development, and a broad familiarity with the field. The biological drug portion of ChEMBL covers all classes of 'biotechnology' drugs, e.g. mAbs, aptamers, enzymes, peptides, solubilised receptors, etc. You will need to have your own computer and software (Excel, or OpenSource equivalent), bioinformatics experience, an investigative mind, and an eye for detail and thoroughness.

If you are interested in this opportunity, mail for further details - please attach a current cv.

Friday, 21 May 2010

ChEMBL Schema and Interface Demos

We're running two more ChEMBL webinars:

A schema walkthrough, for those interested in dowloading the database and understanding the schema/data model - on Thursday 10th June at 3pm BST. Please click here to sign up.

A demonstration of the ChEMBL web-interface and its features - on Thursday 17th June at 3pm BST. Please click here to sign up.

We'll use webhuddle for the slides/demo. You don't need an account or any software, just a Java enabled web-browser. There will also be a separate number to dial into the audio.

Wednesday, 19 May 2010

Open Data for Neglected Tropical Disease Discovery, and Release of ChEMBL-04

It was clearly a slow news day in Swindon that day; but, in a way, wouldn't it be nice to live a place where this was big news. I for one, am glad the tortoise is OK (google with the headline and you'll get the full, detailed story).

Anyway, there are some significant publications in Nature this week on HTS screening and follow-up for Malaria screens (the papers are free content at the moment - Gamo et al, and Guiguemde et al. There are also some press releases for these papers and the public data release. We won't repeat the content of these formal things, but here provide some informal commentary...

The magic data pixies here at EMBL-EBI have been working hard and we have loaded all the data into the latest release of our SAR database - ChEMBL04. The data is now live in the web interface, and the ftp download of the whole database will be in the near future (we are still optimising our production processes, so sorry that the data is available in the front-end before the download files are fully ready and tested - but we took the view that people would probably not want us to hold back access where possible. However, the gap between loading into the front-end schema and the packaged export release will shorten.

We have also put together a 'microsite' called ChEMBL-NTD (NTD stands for Neglected Tropical Disease) accessible at - this showcases and provides easy download of raw data from ChEMBL for this strategically important set of diseases, and also allows the addition of extra functionality for visualisation that isn't available in the ChEMBL front end. We have some exciting plans for community annotation of these data-sets, and more on this later. At the moment, there are download links, in a variety of common formats, for the GSK, St. Judes, and Novartis. Unfortunately we only had time to build some interactive query tools for the ChEMBL plasmodium and GSK datasets; but rest assured, were putting together some tools for cross datasets analysis and querying (given the scientific limits of analysis of large sets of single point screening data).

As you will probably guess, there are more data-sets in the pipeline for release, and we would be delighted if others with similar datasets would be interested in publicly archiving them here at the EMBL-EBI. As always, all the EMBL-EBI data is freely accessible, redistributable, etc>.

If you have any feedback on data formats, the interface, etc please let us know.

Chembl04 contains 680,293 compound records, 565,243 distinct compounds, and 2,705,136 assay data points.

Finally, a heartfelt thanks to many people who have helped us put this together, championed the release of data from their organisations, and acted as the social glue that is so important in getting these sort of things actually done. As the youth the world over now say - respect to Rick Keenan, Jose Garcia-Bustos, Frederic Bost, Pascal Fantauzzi, Richard Glynne, Thierry Diagana, Anang Shelat and Kip Guy!

%T Thousands of chemical starting points for antimalarial lead identification
%J Nature
%V 465
%P 305-310
%D 2010
%A F.-J. Gamo
%A L.M. Sanz
%A J. Vidal
%A C. de Cozar
%A E. Alvarez
%A J.-L. Lavandera
%A D.E. Vanderwall
%A D.V.S. Green
%A V. Kumar 
%A S. Hasan
%A J.R. Brown
%A C.E. Peishoff
%A L.R. Cardon
%A J.F. Garcia-Bustos

%T Chemical genetics of Plasmodium falciparum
%J Nature
%V 465
%P 311-315
%A W.A. Guiguemde 
%A A.A. Shelat
%A D. Bouck
%A S. Duffy
%A G.J. Crowther
%A P.H. Davis
%A D.C. Smithson
%A M. Connelly
%A J. Clark
%A F. Zhu
%A M.B. Jimnez-Dıaz
%A M.S. Martinez
%A E.B. Wilson
%A A.K. Tripathi 
%A J. Gut
%A E.R. Sharlow
%A I. Bathurst
%A F. El Mazouni1
%A J.W. Fowble 
%A I. Forquer
%A P.L. McGinley
%A S. Angulo-Barturen
%A S. Ferrer
%A P.J. Rosenthal
%A J.L. DeRisi
%A D.J. Sullivan Jr.
%A J.S. Lazo
%A D.S. Roos
%A M.K. Riscoe
%A M.A. Phillips
%A P.K. Rathod 
%A W.C. Van Voorhis
%A V.M. Avery 
%A R.K. Guy

Wednesday, 12 May 2010

Enhanced Interface For ChEMBL Now Available

Thanks for all the feedback so far on the interface and usability on the ChEMBL interface; we have been working hard and are pleased to announce the release of a significantly enhanced interface. As regular users will rapidly see, there is a completely new tabbed interface structure, and many other enhancements scattered over many parts of the system.

We also greatly appreciate the reporting of any errors in the data contained within the database - so please keep it up, your work benefits the entire community.

We are also making progress on porting the chemical infrastructure to the fantastic OrChem Open Source chemistry plugin for Oracle databases. An Open Access publication for OrChem can be found here. Watch this space for more news.

Wednesday, 5 May 2010

Informatics to support biological drug discovery

The Industry Programme at the EMBL-EBI is currently planning a ca. 40 attendee workshop later in the year to cover informatics methods and resources used in the discovery of biological drugs. We currently plan to cover a review of historic biological drug discovery, review attrition for biological drugs, patent databases, mAb structure and engineering, enzyme replacement therapies and replacement therapies, methods to address immunogenicity, solubilised receptors, RNA and aptamer-based therapeutics, etc. Experts from industry and academia will present to the various subjects.

We would greatly welcome ideas for inclusion in the workshop, volunteers for speaking, and so forth, also we would like to hear from you if you're potentially interested in attending.

Tuesday, 4 May 2010

Spanish Postdoctoral Fellowships

There is a scheme to fund gifted Spanish (nationals or residents) researchers at EMBL - details are on the following link. The chEMBL group would welcome to host researchers in suitable areas (bioinformatics applied to drug discovery, chemogenomics, knowledge discovery from data for pharmaceutical data, etc.) The deadline is May 29th 2010!

ChEMBL_03 now available on ftp site

Chembl_03 is now available for download from the ftp site. We will also add new releases to the new BioTorrents network (probably complete once we get the hang of this new fangled Internets - many thanks to Egon for his input and advice!).