Skip to main content

Restructuring of ACTIVITY data: VALUE, TEXT_VALUE and ACTIVITY_COMMENT

ChEMBL is a bioactivity database for drug-like compounds. The ACTIVITIES table stores the readout from bioassays that test compounds against a biological target. These are often numerical readouts such as Ki, IC50, Inhibition %, or Cmax but are occasionally non-numerical summary data, e.g., “Not Soluble”, “Not active”, or “Active”. Historically, the non-numerical data was captured as an ACTIVITY_COMMENT but since ChEMBL 24 this has been more accurately captured as TEXT_VALUE. 

Whilst the TEXT_VALUE field has been used more extensively in recent releases, legacy data covering experimental outcomes, observations, context such as threshold values, and other metadata is still largely hosted in the ACTIVITY_COMMENT field. Moving forward, only the TEXT_VALUE field will be used to report the primary outcome of an experiment where the output is not numerical, for example categorical data.  This could be reporting an Activity (e.g., “Active”/Not active”), Toxicity (e.g., “Toxic”/Not Toxic”) or Biotransformation (e.g. “Compound metabolized”). The VALUE field will continue to be used for numeric data, for example an IC50, MIC, Ki value, or % Inhibition. Each individual data point in the ACTIVITY table will only have a VALUE or a TEXT_VALUE, not both. ACTIVITY_COMMENT will only be used for additional information about a VALUE or TEXT_VALUE, for example the threshold for activity, or a comment on the compound solubility, or whether the measurement is “Dose-Dependent” or “Dose-Normalised”.

In addition, we plan to also standardise TEXT_VALUE data in the same way that we standardise UNITS and TYPE. For example, we will automatically convert “Inactive”, “Not Active”, “INACTIVE”, “inactive” etc. into the common annotation “Not active”. By separating and standardising the data we will make it FAIRer, in particular more findable and usable.

To achieve this restructuring, manual curation has determined whether an ACTIVITY_COMMENT is the primary outcome of an experiment or is a true comment on the outcome. We have now assessed the most common ACTIVITY_COMMENTs and identified those for migration to the TEXT_VALUE field: the project will continue as our resources allow. The first data migration will be realised for ChEMBL 37 and will transfer the most common legacy ACTIVITY_COMMENTS to the TEXT_VALUE field enhancing the FAIRness of ChEMBL for our users.

If you have any questions about VALUE, TEXT_VALUE, ACTIVITY_COMMENT, or any data in ChEMBL, please contact chembl-help@ebi.ac.uk.
 

A Blog post by Sybilla Corbett

Comments