Slides and recordings from the recent ChEMBL UGM are starting to appear on the meeting website. Here I want to draw attention to the presentation by Nicolas Bosc and myself on " SureChEMBL: Your next source of research data ". Ever since my NextMove Software days, I've been aware of the large amount of scientific data available in patents. This includes everything from large sets of chemical analogs, to bioactivity values, reactions, and NMR spectra. US patents in particular are a rich source of data as they are (a) born digital, and (b) freely available, and thus automated tools to extract relevant data can generate substantial high quality datasets. For example, here's a graph I did back in 2017 to illustrate a NextMove Software blog post entitled "Are more bioactivities available from patents than from the academic literature?". This compared the data deposited from papers into ChEMBL and that extractable by LeadMine from patents (see the talk linked fr...
Blog of Chemical Biology Services at EMBL-EBI