While the vast majority of molecules in ChEMBL are small molecules, we also have a growing collection of peptide-derived compounds, monoclonal antibodies and other biotherapeutic drugs in the database. Historically, these molecules have been represented by molfiles (for small-medium peptides) or protein sequences (for monoclonal antibodies) in the database.
However, for many biotherapeutics, these formats are not sufficient to represent the complexities of the molecules. Molfiles and other chemical structure formats are impractical for large molecules, and simple protein sequences cannot adequately capture the non-natural amino acids and other modifications that are commonplace in biotherapeutic drugs.
We are therefore working to adopt the HELM (Hierarchical Editing Language for Macromolecules) standard, developed by Pfizer and the Pistoia Alliance, within ChEMBL and plan to include HELM notation for all peptide-derived drugs and compounds in release 20 of the database.
See also the recent press-release for more information.