We've just published a paper on mapping the sites of small molecule binding in complex multidomain proteins (pdf here - this link doesn't seem to work at the moment, sorry). The resolution of the mapping is at the level of Pfam domains. We love Pfam, and love it even more that the Pfam team is moving to the EBI this week. The motivation for this work is multifold, and it addresses a pretty big problem in chemogenomics.
Firstly the issue of domain frustration - you search a protein containing a series of distinct domains looking for homologues in ChEMBL. If your protein contains a common and uninteresting domain, something like a zinc finger or EGF domain (our interest is for small molecule binding remember, we're not saying that these domains are completely boring, they're just a lot less interesting from a chemical biology/drug discovery perspective) you'll retrieve a whole bunch of sequence related, but small molecule binding unrelated data. It's just the way bioinformatics works. You can be selective in searching with just the sequence of the domain you are interested in, but this only solves half the problem, since there's no guarantee that compounds retrieved will bind at that domain in the retrieved protein.
These data will end up in the next version of ChEMBL - but if you want to get hold of any data prior to this, check out the supplementary data for the paper.
Next steps? Well the approach to mapping and scoring the domains could be improved, and the resolution ideally needs to be at a site level within a domain - so that compounds that bind at different structural sites can be differentiated for model development, pharmacophores, etc. It is an undeniable fact that merging compounds that bind at different sites, with different binding determinants will not be able to predict each other. We have made some progress on this latter sub-problem, and more on that soon.
%T Mapping small molecule binding data to structural domains %A F.A. Kruger %A R. Rostom %A J.P. Overington %J BMC Bioinformatics %D 2012 %V 13 %O doi:10.1186/1471-2105-13-S17-S11