So, given that we'll start loading our biological drug sets into ChEMBL shortly, is there any key data missing, as always, any feedback on errors, etc would be greatly appreciated. If anyone would like a file of all the sequences that we have, let me know.
A couple of notes on the data content
- There will be some duplication - due primarily to the INNs not being released with Research Code information, whereas from clinicaltrials.gov they typical enter via a Research Code name - after a few months the entries are linked. So any further information on this set would be greatly appreciated....
- The Phase number refers to the highest phase I could find the antibody drug reaching in the broad literature - it does not capture current status, and in fact a large number of these will have been abandoned by now.
- There are some ambiguities (to me at least) over the USAN year, I use the date the name is published, USAN themselves appear to use a sometimes backdated date, this is probably due to inevitable gaps between assignment of the name and it's publication.