In the original paper the structure is drawn as....
which is correct, and the same as found in wikipedia (Brinzolamide) and ChEMBL (CHEMBL220491). The structure in the PDB file however is something else, an isomer of brinzolamide - the IUPAC name of the ligand in the coordinate set is
(4R)-2-(2-ETHOXYETHYL)-4-(ETHYLAMINO)-3,4-DIHYDRO-2H-THIENO[3,2-E][1,2]THIAZINE-6-SULFONAMIDE 1,1-DIOXIDE
So instead of having the methoxy at the chain terminus there is an ethoxy, and the chain between the oxygen and the ring is one atom shorter - the oxygen has migrated one atom along. This is not brinzolamide, but it's called brinzolamide in the PDB entry (but the structure's wrong), and the paper (where the structure is right). The most likely explanation is that when the crystallographers built the topology file for the ligand they mistyped the atom name/element/whatever - this was a pain back then.
As a first question - what should PDB curators do here? Spot the error and fix the data, probably not, that's not the way that PDB works, but this post-loading, fixing and futzing around with the original data is common in other databases (e.g. ChEMBL). For me, this is the difference between an archive, an a curated resource.
The next level of ambiguity is where people try and extract the chemical structure from the PDB entry (for historical reasons there is no connection table in the PDB file); there's two general ways of doing this - 1) from the IUPAC name in the header, and 2) from the coordinates. Most workers have tackled the latter method, but working out the bonding from the coordinates is surprisingly hard to do completely correctly.
So what happens in this case? Well the AffinDB resource has this (AffinDB is a great resource for structure-based design and dockers).
So, two problems here, loss of stereochem off the ring (it is unambiguous in the 3D coords) and secondly the loss of the double bond for the thiophene - this has the side effect of introducing two new chiral centres into the molecule (so eight possible enantiomers, from the one defined structure used in the experiment). So the ligand, if converted to 3D, from the above structure could not recover the geometry as found in the database. Also the sulphonamide, which binds to the zinc in CAH will no longer be acidic (aryl sulphonamides are weakly acidic), and so the difference will have big differences in terms of inhibitor properties.
And what about PDBe? Well the structure there is this....
Which gets the double bond in the thiophene right, but introduces a spurious chiral centre at the sulphonamide nitrogen. This is quite a subtle case, since the nitrogen in a sulphonamide has a lot of sp3 character, and maybe one configuration is trapped in the crystal complex - but in solution, it will very rapidly invert and equilibrate - it is not a chiral centre.
In summary, this chain of events makes the data integration problem a hard one (for example if one wanted to query across ChEMBL, PDBe, AffinDB at least), and there are confounding statements on what the identity of a particular molecule is, and taking the PDB entry on face value would be confusing. So, data integration is hard! - 'Trust and Verify' is the mantra, but trust and verify names and synonyms even more.



6 comments:
John, from following PubChem links this gets more interesting … and looks even worse to support your point. The compound you start with (CID 4369091 = BHFKHYVXDQDFSR-JTQLQIEISA-N) has six SIDs including your CHEMBL1231543 (picked up as ligand BZU from PDBe), WO1993016701 (Merck), and Shanghai IOC from PDB protein 1a42 from human carbonic anhydrase II. As Merck presumably made this as a brinzolamide analogue the question here is “is this correct in the crystal structure and wrongly labeled in the paper or vice versa”. Now; brinzolamide (= CID 68844 = HCRKCZRJWPKOAR-JTQLQIEISA-N) has SIDs including CHEMBL220491, BZ1 as MMDB (58388.3) and SMID as (BZ1 PDB ligand entry for protein 3znc as murine carbonic anhydrase II) so this might this be correct ? By my count at this point we have PDB, PDBe, SIOC, AffiDB, SMID (but defunct?) and MMDB all attempting to divine the ligand structures….. I have noticed PDBe and MMDB differences before (http://cdsouthan.blogspot.com/2011/08/compound-to-target-mappings-part-i.html) and this has happened here to CID 444137 = AHEOPDLJOFFHFL-VUWPPUDQSA-N = BZU from 1A42 = MMDB (55041.4) = SID 26702126
Can anyone could figure out how to deal with such issue?
The short answer is no. There are multiple levels of problems here. Building the wrong thing into the X-ray structure, so the 'real' experimental structure was technically incorrect; then the parsing of this structure was wrong, and the incorrect annotation was passed along the information chain.
For me, it argues quite strongly for provenance descriptions in resources, and clear demarcation of primary and secondary data sources.
Thanks Chris for the full reply! My guess is they made the topology file wrong, since to discuss the compound as brinzolamide, and then knowing complex something else would have been quite misleading.
Is the density map in the crystal enough to pin down that oxygen and unequivocally assign the ligand as CID 4369091 or 68844 ?
In my opinion. No. It would need very high res, bvery high quality data to try this. Usually environment is a good way to assign, look for H-bonds to discriminate....
Post a Comment