Friday, 11 February 2011

Chembl_09 Schema - A Different View

The guys at the NCI Cactus blog have done a great job of rendering the new ChEMBL schema as released in ChEMBL_09. There are quite a few changes, and we have started to load/curate new data against this schema - so check with us if your analyses rely on something currently in there! Click on the above image for a large view.

Things will be fairly quiet at ChEMBL Manor next week - we have the annual ChEMBL training course which will keep us busy, and hopefully out of any trouble.


Egon Willighagen said...

Thanx again for for tweet earlier about aminoglutethimide. Now, looking at the new scheme, and the ChEMBL content, I see it does it equally wrong as most other databases: the entry is marked as a single chemical graph (without stereo) rather than it being a racemic mixture. If it did, I would not have needed your tweet for the explanation :)

Now, ChEBI actually has a good mechanism for making the distinction (talk to Janna). What are the plans with ChEMBL in this respect? Will we see this corrected? It clearly affects QSAR modeling, as the assay activities are actually related to either one of the stereoisomers in the racemic mixture, or a mix of both. That said, QSAR descriptors will have to take either geometry to calculate 3D descriptors, and as such introduces needless uncertainty in the model.

(And, obviously, this also affects how I should represent things in RDF :)

jpo said...

We have some plans in progress for this. Things are never simple. since we need to catch things like d and l (so unknown but opposite) and trans across two adjacent stereocenters, etc. Just because a chemist publishes something without stereochem shown, doesn't necessarily mean it is racemic.

Our initial focus will be on annotating the issues, to aid interpretation and curation.

A further complication we have come to in the past is for some of the 'neglected' stereocenters, like sulphones. Finally, an interesting clinical candidate case we have corresponded with ChemSpider recently over is flesinoxan - where there are ambiguous links between the +/- and R/S.

I think an interesting area of chemoinformatics science at the moment, with quite a lot more potential is in the area of reduced representation (in contrast to ever more explicit enumeration and calculation). The potential to develop robust landscapes at a lower 'resolution' is quite exciting.

For calculating 3D descriptors, with undefined stereochemistry in lots of cases, or large numbers of possible enantiomers, coupled with large numbers of tautomers, and the problem of pKa prediction and assignment. I wish you the very best of luck.

Joerg Kurt Wegner said...

Is there more cross-linked information for the schema with respect to entries?
* What does assays.assay_type={A,B,F,U} stand for?
* Is is possible to expose a few example SQL queries somewhere?