Thursday, 9 September 2010
EMBL-EBI Small Molecule Bioactivity Course - Feb 2011
Just posted on the EMBL-EBI website are the first details of the Small Molecule Bioactivity Course. There will be more details later, and the agenda needs sketching out in more detail, but the dates, and guest lecturers are all set. So, if you would like to take part in an introductory level course to the use of chemogenomics approaches to understanding biology and supporting healthcare research, keep an eye out for more details, or take a chance in case all the places go, and register now!
On a related note, I would like to highlight the overall good-all-round-goodness and helpfulness of Noel O'Boyle; in honour, I may even go back to using the Irish form of my surname O'Verington.....
The image above will mean little to most, but a lot to the few who watch UK kids TV.
Friday, 3 September 2010
ChEMBL_06 is live
We're pleased to announce the release, a few moments ago of ChEMBL_06. This contains an additional 29,142 compound records and 138,348 new bioactivities. We've also done quite a lot of compound cleanup, names, research codes (vide infra), and so forth. A variety of database dumps are available from the public ftp site, and the live web database is now connect to ChEMBL_06.
Additional data this release includes the standard literature data, but also the data from the brilliant Genomics of Drug Sensitivity in Cancer project, coordinated by Ultan McDermott at the Sanger Center (interested readers in the oncology area are also pointed to this previous blog post).
2010 New Drug Approval - Pt. XI - Alcaftadine (Lastacaft)

This allergic reaction is most familiar in patients with hay fever but can also be caused by other allergens such as dust mites, moulds, perfumes etc. It causes red, itchy and watery eyes.
Allergic conjunctivitis is caused by a type I hypersensitivity reaction of the immune system. Antigenic epitopes of the allergen are detected by IgE antibodies which mediate the excessive activation of mast cells and basophils. The symptoms of allergic conjunctivitis are mainly caused by the release of histamine from these activated immune cells. Histamine increases the permeability of blood vessels and stimulates the activity of immune cells, through a number of differing histamine receptors.
Alcaftadine and it's carboxylic acid metabolite (produced via a non P450 route) are antagonists of the H1 histamine receptor (Uniprot: P35367) and also inhibits histamine release.
Alcaftadine is administered topically as a 0.25% solution. In a pharmakokinetics study, the plasma CMAX of Alcaftadine is 60pg/mL and occurs after 15 minutes, the plasma CMAX of the active metabolite is 10pg/mL and occurs after one hour. Plasma protein binding (ppb) for Alcaftadine is 39.2%, and for the carboxylic acid metabolite is 62.7%. The elimination half-life of the metabolite is appoximately 2 hours. The presence of the aldehyde is an unusual chemical feature in Alcaftadine, since aldehydes are usually quite reactive, as would be expected this group is readily metabolized to a carboxylic acid.
The full prescribing information is here.
Adverse reactions may include eye irritation, eye redness, nasopharyngitis, headache and influenza.
IUPAC: 11-(1-methylpiperidin-4-ylidene)-5,6-dihydroimidazo[2,3-b][3]benzazepine-3-carbaldehyde
SMILES: CN1CCC(CC1)=C2c3ccccc3CCn4c(C=O)cnc24
InChI: 1S/C19H21N3O/c1-21-9-6-15(7-10-21)18-17-5-3-2-4-14(17)8-11-22-
16(13-23)12-20-19(18)22/h2-5,12-13H,6-11H2,1H3 Alcaftadine was developed by the Janssen Research Foundation and will be marketed in the US under the name Lastacaft by Vistakon Pharmaceuticals.
Monday, 30 August 2010
Innovation and Ownership in Drug Discovery by Country (maybe, perhaps, well maybe not then!)
I've been looking at the Research Code data recently, and here is an interesting plot. It is the counts of Research Codes classified by Country. It is a first, look-see plot, based on currently incomplete data, but I think it is quite interesting nonetheless.
A basic assumption behind the assignment of a distinct research code stem is that they reflect an autonomous entity with the aim of discovering drugs. Today the majority of newly founded entities will be funded by private/VC money, and these will be acquired by a larger company once some degree of commercial success, or anticipated commercial potential has been achieved. Our data is a 'blend' of recent and historical data, and over time, the structure and scale of research has changed (a smaller number of companies in the distant past, and a larger number from the mid 1990s onwards as a large number of biotechs were established; also there will be differences across various countries).
The way we have collected the research codes (773 of them so far) will focus on clinical stage compounds, and therefore the ability of that company and associated infrastructure to move compounds through into clinical development. In our tables the research code has a 'currently controlling company' assigned to it, and this company has a 'country' assigned to it - this is the location of its corporate headquarters, and to a first approximation will record where the controlling rights/IP is now held (ignoring any specific licensing deals that have been done over specific drugs). Of course, the location of the headquarters does not reflect where the work is, or has been historically, done. Many current companies have multiple research codes, for example Pfizer has 32 distinct historical research codes, and this count will correlate with a number of mergers and acquisitions over time; these mergers will sometimes switch 'ownership' from one country to another.
The distribution follows a classic power-law distribution (80:20 rule, or a whole bunch of other similar names) - specifically, six countries (of 27) cover 86% of research code stems (the USA, Japan, Germany, France, the UK and Switzerland). To my mind there are a few surprises; for example, the relatively high rank of Japan - this may reflect a complex corporate history of mergers, there are certainly few biotechs in Japan producing clinical candidates; but I just don't know yet. Secondly, Sweden seems lower than I would have expected, but this may be down to mergers transferring 'corporate ownership' from one country to another (Astra and Pharmacia). Conversely, Italy seems higher than I would have initially predicted - but maybe I don't know the history of the industry as well as I should.
Another obvious feature is the low current rank of India and China - although a lot of basic research and outsourcing is done in these territories now, very little of this is currently owned and coordinated by companies headquartered there.
I've given up on trying to use google docs for any of this stuff - it is not that stable for me, and so if anyone is interested in the underlying spreadsheet, mail me....
A basic assumption behind the assignment of a distinct research code stem is that they reflect an autonomous entity with the aim of discovering drugs. Today the majority of newly founded entities will be funded by private/VC money, and these will be acquired by a larger company once some degree of commercial success, or anticipated commercial potential has been achieved. Our data is a 'blend' of recent and historical data, and over time, the structure and scale of research has changed (a smaller number of companies in the distant past, and a larger number from the mid 1990s onwards as a large number of biotechs were established; also there will be differences across various countries).
The way we have collected the research codes (773 of them so far) will focus on clinical stage compounds, and therefore the ability of that company and associated infrastructure to move compounds through into clinical development. In our tables the research code has a 'currently controlling company' assigned to it, and this company has a 'country' assigned to it - this is the location of its corporate headquarters, and to a first approximation will record where the controlling rights/IP is now held (ignoring any specific licensing deals that have been done over specific drugs). Of course, the location of the headquarters does not reflect where the work is, or has been historically, done. Many current companies have multiple research codes, for example Pfizer has 32 distinct historical research codes, and this count will correlate with a number of mergers and acquisitions over time; these mergers will sometimes switch 'ownership' from one country to another.
The distribution follows a classic power-law distribution (80:20 rule, or a whole bunch of other similar names) - specifically, six countries (of 27) cover 86% of research code stems (the USA, Japan, Germany, France, the UK and Switzerland). To my mind there are a few surprises; for example, the relatively high rank of Japan - this may reflect a complex corporate history of mergers, there are certainly few biotechs in Japan producing clinical candidates; but I just don't know yet. Secondly, Sweden seems lower than I would have expected, but this may be down to mergers transferring 'corporate ownership' from one country to another (Astra and Pharmacia). Conversely, Italy seems higher than I would have initially predicted - but maybe I don't know the history of the industry as well as I should.
Another obvious feature is the low current rank of India and China - although a lot of basic research and outsourcing is done in these territories now, very little of this is currently owned and coordinated by companies headquartered there.
I've given up on trying to use google docs for any of this stuff - it is not that stable for me, and so if anyone is interested in the underlying spreadsheet, mail me....
Friday, 27 August 2010
Current GPCR X-ray structures
As part of resurrecting GPCR SARfari from the ashes, we needed to refresh the protein structure content. There are now a surprisingly large number of distinct X-ray family A, rhodopsin-like GPCR structures known, of course it is never enough, but large nonetheless. There are five distinct proteins (bovine and squid rhodopsin, human beta-2 adrenergic receptor, turkey beta-1 adrenergic receptor and human Adenosine A2A receptor. These are known in a variety of different liganded states, crystal forms, resolutions, and also with differing numbers of distinct chains within crystallographic assymmetric units. So in total, there are 27 distinct X-ray PDB entries and 45 distinct GPCR domain structures.
Here is a table, as of 27th August 2010.
Here is a table, as of 27th August 2010.
| PDB code | Ch. | Protein | Ligand | Species | Res. | Date |
| 1f88 | A | Rhodopsin | retinal | Bos taurus | 2.8 | 4 Aug 2000 |
| 1f88 | B | Rhodopsin | retinal | Bos taurus | 2.8 | 4 Aug 2000 |
| 1gzm | A | Rhodopsin | retinal | Bos taurus | 2.6 | 20 Nov 2003 |
| 1gzm | B | Rhodopsin | retinal | Bos taurus | 2.6 | 20 Nov 2003 |
| 1hzx | A | Rhodopsin | retinal | Bos taurus | 2.8 | 4 Jul 2001 |
| 1hzx | B | Rhodopsin | retinal | Bos taurus | 2.8 | 4 Jul 2001 |
| 1l9h | A | Rhodopsin | retinal | Bos taurus | 2.6 | 15 May 2002 |
| 1l9h | B | Rhodopsin | retinal | Bos taurus | 2.6 | 15 May 2002 |
| 1u19 | A | Rhodopsin | retinal | Bos taurus | 2.2 | 12 Oct 2004 |
| 1u19 | B | Rhodopsin | retinal | Bos taurus | 2.2 | 12 Oct 2004 |
| 2g87 | A | Rhodopsin | retinal | Bos taurus | 2.6 | 2 Mar 2006 |
| 2g87 | B | Rhodopsin | retinal | Bos taurus | 2.6 | 2 Mar 2006 |
| 2hpy | A | Rhodopsin | retinal | Bos taurus | 2.8 | 18 Jul 2006 |
| 2hpy | B | Rhodopsin | retinal | Bos taurus | 2.8 | 18 Jul 2006 |
| 2ped | A | Rhodopsin | retinal | Bos taurus | 2.9 | 2 Apr 2007 |
| 2ped | B | Rhodopsin | retinal | Bos taurus | 2.9 | 2 Apr 2007 |
| 2i36 | A | Rhodopsin | apo | Bos taurus | 4.1 | 17 Oct 2006 |
| 2i36 | B | Rhodopsin | apo | Bos taurus | 4.1 | 17 Oct 2006 |
| 2i36 | C | Rhodopsin | apo | Bos taurus | 4.1 | 17 Oct 2006 |
| 2i37 | A | Rhodopsin | apo | Bos taurus | 4.1 | 17 Oct 2006 |
| 2i37 | B | Rhodopsin | apo | Bos taurus | 4.1 | 17 Oct 2006 |
| 2i37 | C | Rhodopsin | apo | Bos taurus | 4.1 | 17 Oct 2006 |
| 2j4y | A | Rhodopsin | retinal | Bos taurus | 3.4 | 25 Sep 2007 |
| 2j4y | B | Rhodopsin | retinal | Bos taurus | 3.4 | 25 Sep 2007 |
| 3cap | A | Rhodopsin | apo | Bos taurus | 2.9 | 24 Jun 2008 |
| 3cap | B | Rhodopsin | apo | Bos taurus | 2.9 | 24 Jun 2008 |
| 3c9l | A | Rhodopsin | retinal | Bos taurus | 2.6 | 5 Aug 2008 |
| 3c9m | A | Rhodopsin | retinal | Bos taurus | 3.4 | 16 Feb 2008 |
| 3dqb | A | Rhodopsin | apo | Bos taurus | 3.2 | 23 Sep 2008 |
| 2z73 | A | Rhodopsin | retinal | Todarodes pacificus | 2.5 | 13 May 2008 |
| 2z73 | B | Rhodopsin | retinal | Todarodes pacificus | 2.5 | 13 May 2008 |
| 2ziy | A | Rhodopsin | retinal | Todarodes pacificus | 3.7 | 27 Feb 2008 |
| 2r4r | A | beta-2 adrenergic receptor | apo | Homo sapiens | 3.4 | 6 Nov 2007 |
| 2r4s | A | beta-2 adrenergic receptor | apo | Homo sapiens | 3.4 | 6 Nov 2007 |
| 2rh1 | A | beta-2-adrenergic receptor | Carazolol | Homo sapiens | 2.4 | 30 Oct 2007 |
| 3d4s | A | beta-2 adrenergic receptor | Timolol | Homo sapiens | 2.8 | 17 Jun 2008 |
| 3kj6 | A | beta-2 adrenergic receptor | apo | Homo sapiens | 3.4 | 16 Feb 2010 |
| 3ny8 | A | beta-2 adrenergic receptor | ICI-118551 | Homo sapiens | 2.8 | 11 Aug 2010 |
| 3ny9 | A | beta-2 adrenergic receptor | novel analog of ICI-118551 | Homo sapiens | 2.8 | 11 Aug 2010 |
| 3nya | A | beta-2 adrenergic receptor | Alprenolol | Homo sapiens | 3.2 | 11 Aug 2010 |
| 2vt4 | A | beta-1 adrenergic receptor | Cyanopindolol | Meleagris gallopavo | 2.7 | 24 Jun 2008 |
| 2vt4 | B | beta-1 adrenergic receptor | Cyanopindolol | Meleagris gallopavo | 2.7 | 24 Jun 2008 |
| 2vt4 | D | beta-1 adrenergic receptor | Cyanopindolol | Meleagris gallopavo | 2.7 | 24 Jun 2008 |
| 2vt4 | C | beta-1 adrenergic receptor | Cyanopindolol | Meleagris gallopavo | 2.7 | 24 Jun 2008 |
| 3eml | A | Adenosine A2a receptor | ZM-241385 | Homo sapiens | 2.6 | 14 Oct 2008 |
Monday, 23 August 2010
More Research Code Stems
Many thanks to those of you who have sent in research code stems! I have updated this page, with about another 70, and the full table should shortly be accessible in chembldb.
Thursday, 19 August 2010
SMR Meeting On Epigenetics - 22nd September 2010
Drat! It's almost as if the SMR committee look at my Google calendar and book all their meetings on days when I'm otherwise occupied.....
Anyway, on Wednesday September 22nd 2010 the SMR are holding a meeting on Epigenetics, one of the hottest current areas of disease biology, at the NHLI, further details here.
Druggability assessment
Here are a couple of references for some work on computer-based target assessment we have been involved in.
%T The Molecular Basis of Predicting Druggability %A Al-Lazikani, B. %A Gaulton, A. %A Paolini, G. %A Lanfear, J. %A Overington, J. %A Hopkins, A. %I Wiley-VCH Verlag GmbH %O http://dx.doi.org/101002/9783527619368.ch36 %O DOI 10.1002/9783527619368.ch36 %P 1315-1334 %B Bioinformatics - From Genomes to Therapies %E Lengauer, T. %O ISBN: 978-3-527-31278-8 %D 2007 %T The Molecular Basis of Predicting Druggability %A Al-Lazikani, B. %A Gaulton, A. %A Paolini, G. %A Lanfear, J. %A Overington, J. %A Hopkins, A. %I Wiley-VCH Verlag GmbH %O http://dx.doi.org/10.1002/9783527619375.ch14b %O DOI 10.1002/9783527619375.ch14b %P 804-823 %B Chemical Biology: From Small Molecules To Systems Biology and Drug Design %E Schreiber, S.L., Kapoor, T.M., & Wess, G. %O ISBN: 978-3-527-31150-7 %D 2007
Tuesday, 17 August 2010
2010 New Drug Approval - Pt. X - Ulipristal Acetate (Ella)
The most recent approval by FDA is Ulipristal Acetate, approved on August 13th 2010 under the trade name Ella. Ulipristal Acetate (previously known by the research code CDB-2914 or VA-2914) is a progesterone agonist/antagonist emergency contraceptive, indicated for prevention of pregnancy following unprotected intercourse or known or suspected contraceptive failure.
This drug is a selective progesterone receptor modulator (SPRM) with antagonist and partial agonist effects (a progesterone agonist/antagonist) at the progesterone receptor (PR, NR3C3) (Uniprot code: P06401). The Progesterone Receptor is a member of a very significant family of proteins for drug discovery, the Nuclear Receptors, a family of around 50 genes which are transcription factors, the transcription by NRs is usually ligand regulated. Ulipristal Acetate prevents progesterone, the endogenous ligand, from occupying its receptor. Ulpristal Acetate binds in the ligand binding domain (LBD) of PR (PFAM: PF00104).
There are several structures known of PR complexed with ligands, a representative one is (PDB: 3D90).
Ulipristal Acetate will compete with Levonorgestrel, another progestagen available on the market, which is approved for use up to three days post-intercourse as opposed to five days in the case of Ulipristal Acetate.
Ulipristal Acetate is a small-molecule, natural product derived drug (Molecular Weight 475.6 g.mol-1), Rule-of-Five compliant and it is delivered as a tablet. Ulispristal Acetate is highly bound to plasma proteins (>94%), including high density lipoprotein, alpha-1-acid glycoprotein, and albumin. It is metabolized to mono- and di-demethylated metabolites, mostly by CYP3A4; the mono-demethylated metabolite pharmacologically active. Ulpristal Acetate shows high affinity for the related nuclear receptor - glucocorticoid receptor (GR, NR3C1). The terminal half-life of Ulipristal Acetate is ca. 32 hours. The recommended dosage is one tablet (30 mg) taken orally, with or without food, as soon as possible, within 120 hours (five days) after unprotected intercourse or a known or suspected contraceptive failure.
The full prescribing information can be found here.
The structure 17alpha-acetoxy-11beta-(4-N,N-dimethylaminophenyl)-19-norpregna-4,9-diene-3,20-dione is a synthetic progestagen and is thus very similar to progesterone. Like other steroid hormones of this class, Ulipristal Acetate is characterized by its basic 21-carbon skeleton, i.e., four interconnected cyclic hydrocarbons with two methyl branches and a ketone. In this particular case, one of the methyl groups is replaced by a substituted aromatic amine.
This drug is a selective progesterone receptor modulator (SPRM) with antagonist and partial agonist effects (a progesterone agonist/antagonist) at the progesterone receptor (PR, NR3C3) (Uniprot code: P06401). The Progesterone Receptor is a member of a very significant family of proteins for drug discovery, the Nuclear Receptors, a family of around 50 genes which are transcription factors, the transcription by NRs is usually ligand regulated. Ulipristal Acetate prevents progesterone, the endogenous ligand, from occupying its receptor. Ulpristal Acetate binds in the ligand binding domain (LBD) of PR (PFAM: PF00104).
There are several structures known of PR complexed with ligands, a representative one is (PDB: 3D90).
Ulipristal Acetate will compete with Levonorgestrel, another progestagen available on the market, which is approved for use up to three days post-intercourse as opposed to five days in the case of Ulipristal Acetate. Ulipristal Acetate is a small-molecule, natural product derived drug (Molecular Weight 475.6 g.mol-1), Rule-of-Five compliant and it is delivered as a tablet. Ulispristal Acetate is highly bound to plasma proteins (>94%), including high density lipoprotein, alpha-1-acid glycoprotein, and albumin. It is metabolized to mono- and di-demethylated metabolites, mostly by CYP3A4; the mono-demethylated metabolite pharmacologically active. Ulpristal Acetate shows high affinity for the related nuclear receptor - glucocorticoid receptor (GR, NR3C1). The terminal half-life of Ulipristal Acetate is ca. 32 hours. The recommended dosage is one tablet (30 mg) taken orally, with or without food, as soon as possible, within 120 hours (five days) after unprotected intercourse or a known or suspected contraceptive failure.
The full prescribing information can be found here.
The structure 17alpha-acetoxy-11beta-(4-N,N-dimethylaminophenyl)-19-norpregna-4,9-diene-3,20-dione is a synthetic progestagen and is thus very similar to progesterone. Like other steroid hormones of this class, Ulipristal Acetate is characterized by its basic 21-carbon skeleton, i.e., four interconnected cyclic hydrocarbons with two methyl branches and a ketone. In this particular case, one of the methyl groups is replaced by a substituted aromatic amine. NAME="Ulipristal Acetate" TRADEMARK_NAME="Ella" ATC_code= NA SMILES="CC(=O)C1(CCC2C1(CC(C3=C4CCC(=O)C=C4CCC23)C5=CC=C (C=C5)N(C)C)C)OC(=O)C" InChI="InChI=1S/C30H37NO4/c1-18(32)30(35-19(2)33)15-14-27 -25-12-8-21-16-23(34)11-13-24(21)28(25)26(17-29(27,30)3) 20-6-9-22(10-7-20)31(4)5/h6-7,9-10,16,25-27H,8,11-15,17H2, 1-5H3/t25-,26+,27-,29-,30-/m0/s1" ChemDraw=Ulipristal_Acetate.cdxThe license holder is Laboratoire HRA Pharma.
Monday, 16 August 2010
MGMS Young Modeller Forum Meeting - December 10 2010, London, UK
Saturday, 14 August 2010
Research Code to Company Name Mapping
Please feel free to download and use the spreadsheet in which ever way you please. If you find any errors, or can provide a longer list (as long as this is from Publicly available sources!) that would be fantastic. Any additions will be credited appropriately.
So, Google docs is not currently working for me, but getting the list onto the blog itself, and therefore getting it indexed up in search engines, etc. was not too painful, so here is a link to an HTML table (sorted by company name).
Wednesday, 11 August 2010
USAN Watch - August 2010
The August 2010 USANs have just been published, these are:
| USAN | Research code | Drug Type | Drug Class | Target |
| Alvocidib | Flavopiridol, HL-275, HMR-1275, L86-8275, MDL-107826A, NSC-649890 | Synthetic small molecule | therapeutic | CDK inhibitor |
| Danoprevir | R05190591, RG-7227, ITMN-191 | Synthetic small molecule | therapeutic | HCV Proteinase inhibitor |
| Latrepiridine | Synthetic small molecule | therapeutic | Complex MOA | |
| Lunacalcipol | CTA-018 | Natural product-derived | therapeutic | Vitamin D receptor |
| Mavrilimumab | CAM-3001 | mAb | therapeutic | GMCSFr alpha-chain |
| Moxetumomab pasudotox | CAT-8015, HA22 | mAb | therapeutic | CD-22 |
| Semuloparin sodium | AVE-5026 | Oligosaccharide | therapeutic | Antithrombin III |
It may be of interest to note the USANs "Semuloparin" and "Semuloparin Sodium" in July and August this year. These USANs refer to the same active substance (Semuloparin), one being the sodium salt of the other. There are slight differences that exist between the WHO INN process and the USAN process. INNs do not include the salt/counterion in the name, whereas USANs historically have. Now, for USANs, both the salt and the parent molecule get assigned distinct USANs.
Deadline for ESPOD Project on Malaria Target Discovery Is Approaching....
Tuesday, 3 August 2010
ChEMBL Resources for Drug Discovery Course - Feb 2011
We have penned into our diaries the dates of Monday February 14th 2011 thru Friday February 18th 2011 for the second ChEMBL residential training course. This will be held on campus here at Hinxton, registration and details of the sessions to be covered will appear on the EBI website shortly.
2010 year was our first course, and we had to prepare a lot of material, etc. but we really enjoyed it, and the 2011 course will be even better.
The image above is from the excellent xkcd.
Monday, 2 August 2010
From One Of Our Collaborators - MoSS+ChEMBL with Bioclipse
“Pharmaceutical Knowledge Retrieval through Reasoning of chEMBL RDF” is
the title of my master thesis, a twenty-week research project performed at the
Department of Pharmaceutical
Bioscience at Uppsala University (Prof.
Wikberg, supervised by Egon
Willighagen). The project aims at using the ChEMBL data with a technology
that might be new to some: by using semantic web technologies. The
life sciences workbench Bioclipse
(doi:10.1186/1471-2105-10-397)
has support for several semantic web tools, including RDF, and was used to establish such a
connection.
Two aspects were looked at in this study. Firstly, we developed the search functionality for ChEMBL data to use RDF. For this, we took advantage of the RDF-ized ChEMBL knowledgebase (using the data from ChEMBL 02). Secondly, we developed a use case where compounds derived from ChEMBL are analyzed with the substructure mining software MoSS (see the Bioclipse Wiki). Here, we search for common and discriminative substructures within or between kinase families.
Within the context of these two aspects, we developed an application using both the JavaScript and the Wizard functionality in Bioclipse. The above shown wizard shows how various searches for compound-protein interaction can be formulated. Results are shown in the "Results table". The user can then select which data he wants to save, by moving it to the lower table which lists the data that will be saved by this wizard.
A second, more application-targeted Wizard was developed that primarily concentrates on retrieving compounds that bind proteins in a certain kinase family with a given activity type (see below). A histogram can be opened to visualize the distribution of activities. Lower and upper bound values can be selected, for focus, for example, only on that active compounds. A second, identical wizard page is provided to select a second dataset. This allows the user to set up a between-family data set. The saved data can then be used in the MoSS application to find the common and discriminative substructures (not shown).
Benefits of this approach focus on the data interoperability: the RDF
technologies are used as uniform and Open Standard access to the ChEMBL data.
Using this approach, implementing new search queries is very easy, and does not
require one to know anything about the database schema; a common controlled
vocabulary (ontology) hides those implementation details. Community standards
for such vocabularies are under development, and will integrating the ChEMBL
data with other databases and other applications.
Does this sounds interesting to you, or do like to give us feedback? Please send a note to annzi.andersson+chembl@gmail.com . Further details are provided in my blog!
Sincerely, Annsofie Andersson.
Two aspects were looked at in this study. Firstly, we developed the search functionality for ChEMBL data to use RDF. For this, we took advantage of the RDF-ized ChEMBL knowledgebase (using the data from ChEMBL 02). Secondly, we developed a use case where compounds derived from ChEMBL are analyzed with the substructure mining software MoSS (see the Bioclipse Wiki). Here, we search for common and discriminative substructures within or between kinase families.
Within the context of these two aspects, we developed an application using both the JavaScript and the Wizard functionality in Bioclipse. The above shown wizard shows how various searches for compound-protein interaction can be formulated. Results are shown in the "Results table". The user can then select which data he wants to save, by moving it to the lower table which lists the data that will be saved by this wizard.
A second, more application-targeted Wizard was developed that primarily concentrates on retrieving compounds that bind proteins in a certain kinase family with a given activity type (see below). A histogram can be opened to visualize the distribution of activities. Lower and upper bound values can be selected, for focus, for example, only on that active compounds. A second, identical wizard page is provided to select a second dataset. This allows the user to set up a between-family data set. The saved data can then be used in the MoSS application to find the common and discriminative substructures (not shown).
Does this sounds interesting to you, or do like to give us feedback? Please send a note to annzi.andersson+chembl@gmail.com . Further details are provided in my blog!
Sincerely, Annsofie Andersson.
Subscribe to:
Posts (Atom)


















