Finding Compounds in Databases using UniChem

Have you ever identified an interesting compound and wondered what else is known about it? For example is there any bioactivity data on it in ChEMBL or PubChem? Is there any toxicity data on it (CompTox)? Then having found interesting data on a compound wondered if it can be purchased or whether it has been patented. All this can be done using UniChem. Interested?

Come along to our webinar on 29th March at 2pm BST (3pm CEST, 9am EDT)

You will however need to register by emailing chembl-help. Places are limited so please let us know as soon as possible if you register but are then unable to attend.

If you want to know more about UniChem please read on.

UniChem (https://www.ebi.ac.uk/unichem/ is a simple system we have developed to cross-reference compounds across databases both internal to EMBL-EBI and externally. Currently we have cross-references to 140 million compounds in 30 different databases. Information about the sources indexed in UniChem can be found here. UniChem is updated weekly with new compounds from these source databases.

So, for example, you can input a database identifier or an InChIKey into UniChem and see links to all the other indexed databases that have information about that compound.

If we take the drug paroxetine and search for it in UniChem, it is found in 22 databases and the UniChem webpage gives links to the paroxetine entries in those databases.

You don’t have to do this compound by compound using the web interface though. UniChem has a comprehensive set of web services that you can use to retrieve data or alternatively all the database files and source to source mapping files are available for download.

UniChem relies on the InChIKey to do the mapping between databases and this works fine if two databases have exactly the same structure for a compound. We all know however that this isn’t always the case. Sometimes a different salt or isotope was tested or a mistake was made in the stereocentre assignment meaning the InChIKeys no longer match.

However don’t despair. UniChem connectivity searching can help. https://www.ebi.ac.uk/unichem/info/widesearchInfo It turns out that because of the clever way that the InChI is built up with layers, this can be deconstructed and mapping can be done such that the relationship between compounds that differ by stereochemistry, isotopes, protonation state etc can all be identified and mapped. You can do this on single components or mixtures.

Taking our paroxetine example:

We have paroxetine and a number of related compounds in ChEMBL. For example:

Maybe someone wanted to genuinely test these related compounds or maybe they are errors (or a mixture of both). Whatever the reason by using the UniChem connectivity searching feature we can identify any compounds that match paroxetine on the InChI connectivity layer.

The matches identified from a connectivity search starting with paroxetine can be found here:

At the webinar on 29^th March we will describe how this is done in more detail and discuss some use cases. If you are interested don’t forget to register.

If you want to read more here are links to two papers about UniChem:

Chambers, J., Davies, M., Gaulton, A., Hersey, A., Velankar, S., Petryszak, R., Hastings, J., Bellis, L., McGlinchey, S. and Overington, J.P.
UniChem: A Unified Chemical Structure Cross-Referencing and Identifier Tracking System.
Journal of Cheminformatics2013, 5:3 (January 2013).

Chambers, J., Davies, M., Gaulton, A., Papadatos, G., Hersey and Overington, J.P.
UniChem: extension of InChI-based compound mapping to salt, connectivity and stereochemistry layers.
Journal of Cheminformatics2014, 6:43 (September 2014)

The ChEMBL-og

Search This Blog

Finding Compounds in Databases using UniChem

Labels

Comments