ChEMBL Resources

The SARfaris: GPCR, Kinase, ADME

Monday, 12 April 2010

VEHICLe - virtual exploratory heterocyclic library

An interesting and thought provoking paper from last year was 'Heteroaromatic Rings of the Future' by Will Pitt (of UCB) (subscription required) and others at UCB. The basic idea of the paper was to exhaustively identify then analyse the class of all possible heterocycles with the following constraints. i) mono and bicyclic rings, ii) Only 5 and 6 membered rings, iii) Only containing C, N, O, S and H, iv) neutral, v) obey Hückel’s 4n+2 rule of aromaticity , and vi) Only exocyclic carbonyls. Heterocycles like this are at the very core of drug discovery and medicinal chemistry.

The dataset is now available for download from the chembl ftp site, and also as a Google document

The file contains...

  • regid: the id for each distinct ring system
  • SMILES: the encoded chemical structure of each ring system
  • Training dataset hits: the count of substructure hits found in the
    original search of commercial compound catalogues, drugs etc. (as reported in the paper).
  • Beilstein hits: the count of substructure hits in the Beilstein
    database at that time (June 2008). Some fields are blank - searching with benzene
    and other common ring systems would have taken too long.
  • Pgood: predicted synthetic tractability after training with both the
    above datasets
  • Tautomer cluster: tautomeric equivalents are grouped into clusters

Will can be contacted at will.pitt (at) for a free reprint of the paper, or more discussions of the work.

We will integrate the VEHICLe ring system regids into Chembl at some point in the future.

%T Heteroaromatic Rings of the Future
%A W.R. Pitt
%A D.M. Parry
%A B.G. Perry
%A C.R. Groom
%J J. Med. Chem.
%D 2009
%V 52
%P 2952-2963

No comments: