The file is around 2.4MB in size, is in fasta format, and the identifiers are simply the internal database identifiers (tids), but there are also organism and trivial protein names as well. The exercise of linking these through to UniProt, RefSeq, etc, etc. is left, as they often say, as an exercise for the reader (for now). However, it should give some idea of the diversity and distribution of sequences within the databases.
Friday, 2 January 2009
ChEMBL Target Dictionary
Here is a link to the ChEMBL databases target dictionary. This contains the sequences of the targets contained within the entire set of ChEMBL databases, with a few exceptions (primarily around CandiStore entries). The vast majority of these are from the StARlite medicinal chemistry database, however, not all of them currently are, so caveat emptor.