ChEMBL 37 arrives with nearly three million compounds, a new modality field for capturing targeted protein degraders, and a wave of data quality improvements. A new field for targeted protein degradation Around 29,000 bioactivity data points have been annotated with ACTIVITIES.MODALITY = “Targeted Protein Degradation” , including around 3,000 legacy bioactivities from approximately 1,200 remapped assays. This marks ChEMBL’s first systematic effort to flag TPD data across the database. Extensive update of literature data Scientific literature remains the backbone of ChEMBL, and ChEMBL 37 brings one of the more substantial literature updates in recent releases. Compared to ChEMBL 36, the literature source gained approximately 79,000 new assays, 270,000 new bioactivity data points, and 52,000 new compound records — pushing the total literature-derived activity count past 9.55 million. Twelve new c...
ChEMBL extracts data from the core medicinal chemistry literature and therefore reflects ongoing developments in drug discovery. One area currently attracting high interest is targeted protein degradation: compounds that direct disease-causing proteins to the cell’s degradation machinery. These modalities are both present and rapidly increasing within ChEMBL, providing an important data source for the community. Targeted Protein Degradation Data in ChEMBL However, new and emerging modalities bring new challenges. Data should be well structured and FAIR, but for newer modalities, controlled vocabularies may not exist or adequately cover the breadth of data being reported. Cu ration effort in this area supports our broader goals towards generating bespoke datasets and improving AI-readiness. W e recently had the opportunity to attend the first ISCB UK conference where we presented our work towards the capture and annotatio...