ChEMBL Resources

Resources:
ChEMBL
|
SureChEMBL
|
ChEMBL-NTD
|
ChEMBL-Malaria
|
The SARfaris: GPCR, Kinase, ADME
|
UniChem
|
DrugEBIlity
|
ECBD

Thursday, 19 February 2009

SARfari - An Overview

SARfari is an integration platform built on top of our databases (StARlite, CandiStore and DrugStore). It also includes an open architecture for the loading of proprietary (or other third party data). This loading is performed against a series of local 'business rules' that define chemical structure representation, target mapping, assay units, etc. The SARfaris are written in the Catalyst MVC framework (so essentially a structured perl), and also Apache for the application server. The original idea was to mirror in an informatics system a 'platform' view of drug discovery - in this case integrated data within a gene family of interest (but it could be around a metabolome-based view, e.g. adenine binding proteins, an entire genome, etc.. Our first foray into the area was in 2004 with a GPCR system built with SQLite, however, this was not very extensible, and we rapidly reached the bounds that were technologically possible. We then built a protein kinase version, and then a rhodopsin-like GPCR SARfari under Oracle, and included the ability to load 'local' data. SARfari is quite neat in that it integrates SAR, sequence alignment, binding site, and 3-D structure data all into a single, simple portal; with, of course, a focus on drug discovery processes. One additional thing we did here was to process patent crystal structures (which often never make it into the RCSB PDB) into a usable form, and then loading these into a version of SARfari.

Our future plans include building a generic SARfari builder (so you will be able to paste a sequence of an arbitrary target into a web page, and then, after a short while, fully integrated and federated data will be delivered, as a new stand-alone gene-family themed web application).

At the EMBL-EBI we will host a copy of kinase SARfari and GPCR SARfari, populated with the relevant public-domain data from our own databases, the software systems (including source code) will be downloadable in toto and gratis, and installable locally for loading of local lab data (we do not plan to allow upload of data onto the EMBL-EBI SARfaris). At the moment, SARfari requires Oracle 9, and the Symyx chemical data cartridge, but future development will be directed towards a more generic and Open Source solution, including the CDK. If anyone would like to try the existing SARfari systems in advance, please feel free to contact us now.

The same software infrastructure and look-and-feel will be used for the DrugEBIlity project at the EMBL-EBI.

4 comments:

Mike Siani-Rose said...

Can someone help me find the Catalyst MVC framework (structured PERL) scrips for reading in the data in ks_activity.csv? Figured this might be a good place to start since there is so much data and 540 different assays!

Thank you,
Mike

mark said...

Hi Mike,

We do not have any Perl scripts designed specifically for parsing the ks_activity.csv file. The Perl Catalyst MVC code referenced in the blog post is actually the Kinase SARfari interface, which pulls its data from an Oracle database. You can access the interface here. The bioactivity section of the userguide should provide you with some more information on how to query the data online.

If you want run more advance queries than the interface offers you could consider loading the file into a local database.

Hope this helps,

Mark

Mike Siani-Rose said...

Thank you, Mark. Is there a more detailed explanation of the "Assay Name"? I detect 540 different assay names; I find the codes difficult to interpret by eye.

mark said...

Hi Mike,

I apologies for delay in response.

I am not to sure where the term "Assay Name" is coming from, possibly an incorrectly named column header. Could you tell me where you saw this term (in the downloads or interface?). Also, if you have some more detailed questions please use the chembl-help at ebi.ac.uk mailing address

Thanks,

Mark