ChEMBL Resources

The SARfaris: GPCR, Kinase, ADME

Saturday, 28 February 2009

Books and Papers - 6 - Software Tools, Kernighan and Plauger

This was the first proper programming book that I studied, and it is an old one too, written before the web, web-services, and networking - good old fashioned UNIX programming computation. Few words, lots of concepts, and merits revisiting now and then - the prose in Kernighan's books is just excellent, perfect pace, and combines advice with examples of little code snippets. Probably the best thing about the book (for me) is the use of ratfor - a derivative of Fortran that looks a lot like C, and has some of the best elements of both languages. It is also incredibly quick to code in. I have just downloaded some updated versions of ratfor for my mac, so expect some pretty unusual looking Open Source protein analysis tools anyday soon!

Anyway, although ratfor is used, for almost everyone, the code there will just be pseudocode for implementation and inspiration of code in a more fashionable language. If anything, using a non-current language forces thinking more deeply about the actual structure of the program.

%A B.W. Kernighan
%A P.J. Plauger
%T Software Tools
%I Addison Wesley
%D 1976
%O ISBN 978-0201036695

Wednesday, 25 February 2009


A brief link to a developing project under the ESRF Research Infrastructure Roadmap - EU-OPENSCREEN. The basic idea is to develop and establish an underpinning infrastructure in a pan-European setting to allow screening and chemical tool identification for new, therapeutically/commercially unvalidated protein targets. This is a fantastically important project at the current time, both scientifically and economically, so I hope you also are interested in it's progress.

Thursday, 19 February 2009

Conference - German Chemical Society, Frankfurt, August 30th 2009

We are going to present at an Open Source Drug Discovery Session at the National Meeting of the German Chemical Society, Held in Frankfurt am Main, August 30th to September 2nd 2009. Further details of the meeting are here

SARfari - An Overview

SARfari is an integration platform built on top of our databases (StARlite, CandiStore and DrugStore). It also includes an open architecture for the loading of proprietary (or other third party data). This loading is performed against a series of local 'business rules' that define chemical structure representation, target mapping, assay units, etc. The SARfaris are written in the Catalyst MVC framework (so essentially a structured perl), and also Apache for the application server. The original idea was to mirror in an informatics system a 'platform' view of drug discovery - in this case integrated data within a gene family of interest (but it could be around a metabolome-based view, e.g. adenine binding proteins, an entire genome, etc.. Our first foray into the area was in 2004 with a GPCR system built with SQLite, however, this was not very extensible, and we rapidly reached the bounds that were technologically possible. We then built a protein kinase version, and then a rhodopsin-like GPCR SARfari under Oracle, and included the ability to load 'local' data. SARfari is quite neat in that it integrates SAR, sequence alignment, binding site, and 3-D structure data all into a single, simple portal; with, of course, a focus on drug discovery processes. One additional thing we did here was to process patent crystal structures (which often never make it into the RCSB PDB) into a usable form, and then loading these into a version of SARfari.

Our future plans include building a generic SARfari builder (so you will be able to paste a sequence of an arbitrary target into a web page, and then, after a short while, fully integrated and federated data will be delivered, as a new stand-alone gene-family themed web application).

At the EMBL-EBI we will host a copy of kinase SARfari and GPCR SARfari, populated with the relevant public-domain data from our own databases, the software systems (including source code) will be downloadable in toto and gratis, and installable locally for loading of local lab data (we do not plan to allow upload of data onto the EMBL-EBI SARfaris). At the moment, SARfari requires Oracle 9, and the Symyx chemical data cartridge, but future development will be directed towards a more generic and Open Source solution, including the CDK. If anyone would like to try the existing SARfari systems in advance, please feel free to contact us now.

The same software infrastructure and look-and-feel will be used for the DrugEBIlity project at the EMBL-EBI.

Tuesday, 17 February 2009

Course - Practical Aspects of Small Molecule Drug Discovery

A brief alert to a Wellcome Trust course being held here on campus this July.

And while we are on the subject of conferences, the RSC is organising a meeting on the 1st October 2009 on 'Chemical Tools and Challenges in Systems Biology' to be held in Stevenage in the U.K.. jpo is speaking at this meeting on the ChEMBL project. There is nothing on the RSC website yet, but it looks like it should be a good meeting.

Sunday, 15 February 2009

Books and Papers - 5 - Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison

You know what it's like - you have a deadline, stuff that's really important to do, you get your work area ready, and then you browse your bookcase for something interesting. Four hours later, it's time to do something else. Well that was yesterday, and I spent those hours re-reading this old classic. Fairly recently released in a reprinted, and cheaper form. In my opinion, this is one of the best books in sequence comparison, it is full of interesting ideas, has good coverage of related fields of computer science, and a coverage of the algorithms that are deep enough to allow you to go away and start messing around with code.

%D 2000
%E David Sankoff & Joseph Kruskal
%T Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison
%I Cambridge University Press
%O ISBN 978-1575862170

Thursday, 12 February 2009

EMBL-EBI Industry Programme

I thought I would post a note on the EMBL-EBI's Industry Programme. This is a forum open to all life-science companies to network and understand the tools, resources and direction of the EMBL-EBIs activities, and also encourage discussion of pre-competitive and collaborative activities between life-science companies. Given the increasing demands for cost-reduction, data integration and knowledge representation within the sector this activity is becoming increasingly important. Details of how to join the can be found on the EMBL-EBI web-site (link above). An analogous group exists for smaller companies (SME/SMBs).