An interesting and thought provoking paper from last year was 'Heteroaromatic Rings of the Future' by Will Pitt (of UCB) (subscription required) and others at UCB. The basic idea of the paper was to exhaustively identify then analyse the class of all possible heterocycles with the following constraints. i) mono and bicyclic rings, ii) Only 5 and 6 membered rings, iii) Only containing C, N, O, S and H, iv) neutral, v) obey Hückel’s 4n+2 rule of aromaticity , and vi) Only exocyclic carbonyls. Heterocycles like this are at the very core of drug discovery and medicinal chemistry.
The dataset is now available for download from the chembl ftp site, and also as a Google document
The file contains...
- regid: the id for each distinct ring system
- SMILES: the encoded chemical structure of each ring system
- Training dataset hits: the count of substructure hits found in the
original search of commercial compound catalogues, drugs etc. (as reported in the paper).
- Beilstein hits: the count of substructure hits in the Beilstein
database at that time (June 2008). Some fields are blank - searching with benzene
and other common ring systems would have taken too long.
- Pgood: predicted synthetic tractability after training with both the
above datasets
- Tautomer cluster: tautomeric equivalents are grouped into clusters
Will can be contacted at will.pitt (at) ucb.com for a free reprint of the paper, or more discussions of the work.
We will integrate the VEHICLe ring system regids into Chembl at some point in the future.
%T Heteroaromatic Rings of the Future %A W.R. Pitt %A D.M. Parry %A B.G. Perry %A C.R. Groom %J J. Med. Chem. %D 2009 %V 52 %P 2952-2963 %O VEHICLe
Comments