Tuesday, 29 January 2013

ChEMBL 15 Schema Changes

ChEMBL_15 will be released this week. As mentioned previously, there will be some major schema changes. For many users, the most significant of these will be:

1) Removal of protein-specific information (e.g., sequences/accessions) from the target_dictionary to a separate 'component_sequences' table. The target_dictionary now includes entries for protein complexes, protein families and other 'group' targets. These then link to their protein components via the target_components table.

2) Removal of the assay2target table. Each assay now links only to a single target (though this target may consist of multiple proteins in the case of a protein complex/family). Information previously included on the assay2target table (tid, confidence_score etc) is now on the assays table.

We have provided a diagram and documentation of the new schema on the chembl ftp site:
ChEMBL_15 release documentation

Please take some time to familiarise yourselves with the changes before integrating the new dataset. Further information will be provided in the release notes, and we will be running a webinar in the next few weeks to explain the changes.

