Skip to main content


Showing posts from January, 2013

Paper: A Ligand’s-Eye View of Protein Similarity

Gerard and I have just had a News & Views published in Nature Methods - link to the pdf is here . This is a commentary on a paper by Lin et al . which uses metrics derived from pharmacological similarity to cluster proteins - there are some interesting differences between the same proteins clustered by sequence similarity, anyway, here's the N&V and below is the discussed paper (pdf link here ) %A G. Van Westen %A J.P. Overington %D 2013 %T A Ligand’s-Eye View of Protein Similarity %J Nature Methods %V 10 %P 116-117 %O doi:10.1038/nmeth.2339 %A H. Lin %A M.F. Sassano %A B.L. Roth %A B.K. Shoichet %T A pharmacological organization of G protein-coupled receptors %J Nature Methods %V 10 %P 140-146 %D 2013 %O doi:10.1038/nmeth.2324 jpo

ChEMBL_15 Released

We are pleased to announce the release of ChEMBL_15. This version of the database was prepared on 23rd January 2013 and contains: 1,434,432 compound records 1,254,575 compounds (of which 1,251,913 have mol files) 10,509,572 activities 679,259 assays 9,570 targets 48,735 documents 17 activity data sources You can download the data from the ChEMBL ftpsite: Please see chembl_15_release_notes.txt for full details of all changes in this release, including important schema changes! Data changes since the last release: We have made several major changes/additions to the data in ChEMBL_15: Incorporation of data from the USP Dictionary of USAN and International Drug Names. Incorporation of monoclonal antibody clinical candidates and sequences. Creation of targets for protein complexes and protein families. Standardisation of activity data and identification of potential issues. Annotation of predicted compo

ChEMBL 15 Schema Changes

ChEMBL_15 will be released this week. As mentioned previously, there will be some major schema changes. For many users, the most significant of these will be: 1) Removal of protein-specific information (e.g., sequences/accessions) from the target_dictionary to a separate 'component_sequences' table. The target_dictionary now includes entries for protein complexes, protein families and other 'group' targets. These then link to their protein components via the target_components table. 2) Removal of the assay2target table. Each assay now links only to a single target (though this target may consist of multiple proteins in the case of a protein complex/family). Information previously included on the assay2target table (tid, confidence_score etc) is now on the assays table. We have provided a diagram and documentation of the new schema on the chembl ftp site: ChEMBL_15 release documentation Please take some time to familiarise yourselves with the changes befor

UniChem Released

For data managers of chemistry resources, the maintenance of structure-based links to other chemistry resources can be a tedious chore. The job is all the more burdensome knowing that your counterparts in other chemistry based-resources are essentially duplicating your efforts, in order to keep their links to your resource updated. In an attempt to remove this duplication of effort, and automate the processes involved, we have developed UniChem ,  and which is described in a recent publication . Getting structure-based links out of UniChem can be achieved either via the web-interface or the web services. For automated updating, using the web-services is often the best choice. The current set of web service methods has been designed to allow users several options for how they might obtain links data. Below are detailed two possibilities. One such option would be to use the following methods: First, query UniChem for all valid src_id’s usi

New Drug Approvals 2012 - Pt. XXXV - Elvitegravir/Cobicistat/Emtricitabine/Tenofovir disoproxil fumerate (STRIBILD®)

Elvitegravir: Cobicistat: ATC Code :   J05AR09 Wikipedia :  Elvitegravir/Cobicistat /Emtricitabine/Tenofovir On August 27, FDA approved the complete regimen for treatment of Human Immunodeficiency Virus -1 ( HIV-1 ) infection in adults who are antiretroviral treatment-naïve. STRIBILD®, combination of a  HIV-1 integrase  strand transfer inhibitor ( INSTI ) -  Elvitegravir , a pharmacokinetic enhancer -  Cobicistat  and two nucleos(t)ide analog HIV-1  Reverse Transcriptase  (RT) inhibitors ( NRTI's ) -  Emtricitabine/Tenofovir disoproxil . Acquired immunodeficiency syndrome ( AIDS ) is a disease of the human  immune system  caused by  HIV  infection, in which progressive failure of the immune system allows life-threatening  opportunistic infections  and  cancers  to thrive. HIV infects and kills vital cells involved in immune system such as  T helper cells  (specifically  CD4+  T cells,  macrophages  and  dendritic cells . When CD4+ T cell numbers de

MMV 11th Call for proposals - H2L and LO for Malaria Drug Discovery

Many of the readers of the ChEMBL-og are interested in drug discovery against neglected and rare diseases. One of the great things for us in this field is the opening up of data in this field - there was the almost simultaneous release of primary HTS data from GSK, Novartis & St. Judes in 2011, more recently the results of a GSK HTS for TB. Having this data publicly available, for all, means that many smart people can analyse the data, and of course, pooling data in this way effectively is equivalent to running the assay against a far larger compound set, and allows more powerful cheminformatics analysis to identify chemical series, preliminary SAR, etc . Many of these datasets are available in our ChEMBL-NTD and ChEMBL-Malaria archives - and we know 2013 will be a great year for more data just like this! All these data are available for download, in the exact form as supplied by the depositor, no accounts/passwords, no lock-in to a software infrastructure, with no restrict

New Drug Approvals 2012 - Pt. XXXIV - RaxibacumabTM

ATC Code: Wikipedia:   Raxibacumab On December 14th 2012 the FDA approved  Raxibacumab for the treatment of inhalation anthrax, a form of anthrax caused by the inhalation of anthrax spores. The drug is also approved to treat inhalation anthrax when alternative therapies are not available or appropriate. Raxibacumab is a 146 kDa monoclonal antibody that is designed to neutralize the toxin secreted by Bacillus Anthracis . The FDA granted raxibacumab fast track designation, priority review, and orphan product designation. Bacillus Anthracis toxin (Anthrax toxin) is a secreted three protein exotoxin. It consists of two enzyme components;  lethal factor (LF, PDB 1PWU ), a bacterial endopeptidase and edema factor (EF, PDB 1PWW ), a bacterial adenylate cyclase. These are combined with one cell-binding protein; protective antigen (PA, PDB 1ACC ). The individual components are non toxic and the combination of the enzyme components with the cell-binding protein makes them to

New Drug Approvals 2012 - Pt. XXXII - Bedaquiline (SirturoTM)

ATC Code: J04AK05 Wikipedia: Bedaquiline On December 28, the FDA approved Bedaquiline (as the fumarate salt; tradename: Sirturo ; Research Code: R-403323 (for Bedaquiline Fumarate), R-207910 and TMC-207 (for Bedaquiline)), a novel, first-in-class diarylquinoline antimycobacterial drug indicated for the treatment of pulmonary multi-drug resistant tu berculosis (MDR-TB) as part of combination therapy in adults. Turbeculosis is an infectious disease caused by the mycobacteria Mycobacterium tuberculosis , which usually affects the lungs. MDR-TB occurs when M. tu berculosis becomes resistant to the two most powerful first-line treatment anti-TB drugs, Isoniazid (ChEMBL: CHEMBL64 ) and Rifampin (ChEMBL: CHEMBL374478 ). Bedaquiline is the first anti-TB drug that works by inhibiting mycobacterial adenosine 5'-triphosphate (ATP) synthase (for Uniprot_IDs, clique here ), an enzyme essential for the replication of the mycobacteria. ATP is the most commonly used energy currency o

DjangoCon - Vote for ChEMBL!!!!

We have a talk entered for DjangoCon Europe 2013 - and there is a vote underway for this - the title of the talk you may wish to vote for is " Do you feel the chemistry? Developing scientific applications with Django. " which is, you probably agree, a pretty interesting subject. You will need a github account to vote - but being hip cats you'll have one already. I must point out that this post has nothing to do with the smash-hit block-buster film Django Unchained from the superstar director Quentin Tarantino (but search engines may well be too stupid to realised this and bump the rank of this post). Update: Voting is now closed. jpo

Where should you/can you publish your ChEMBL research?

Well, we've got to about 125 citations( 1 ) for the main ChEMBL database paper so far, which for a year is a pretty good haul we think. Given this reasonably big number, we thought it would be appropriate to analyse where the use of ChEMBL makes it's way into the published literature - or what is our 'research user community'(2). A simple way to analyse this is to look at papers that cite ChEMBL, grouped by journal. The graph is below - it's a classic log-normal/power law style frequency-class distribution. So J Chemical Information & Modelling (JCIM) is about 20% of all citations, and could indicate that the biggest early impact of ChEMBL is in the development of novel methods for compound design - which was one of our hopes for what our work and the ChEMBL data could achieve - better, safer drugs, quicker! Then there's the database community in Nucleic Acids Research (this is quite an unusual journal for comp chemists and modellers, but it is the de fa

Reminder: Pipeline Pilot Cambridgeshire UGM

This is a gentle reminder for the Cambridgeshire  Pipeline Pilot   Users Group Meeting that will take place on   Thursday 17th January 2013 (aka tomorrow ), at 3pm   here at the ChEMBL HQ. This is the agenda for the meeting: 1. Welcome and Host talk:  George Papadatos + Gerard van Westen       Cool things with Pipeline Pilot and ChEMBL 2. Peter Woollard (GSK)     Using Pipeline Pilot for computational biology capabilities, where it has helps the most and where it is less used. 3. Richard Carter (Oxford Nanopore Technology):       PP on a memory stick 4. Mike Cherry (Accelrys) :       Repetitive Data Flow 5. Question and Answer session including:    - how people have found NGS components  and TAC components 6. Willem van Hoorn (Accelrys)       Matched Molecular Pairs 7. Adrian Stevens (Accelrys)       Upcoming chemistry components in PP9.0 There's still time so if you fancy attending, drop  us  a line. George

Paper: UniChem

We have just had a paper published on UniChem - simple name, simple functionality, but we love it, and it has become the way that we map ChEMBL to other data sources and keep things linked in real time, and also keep the ChEMBL molecule tables manageable. It's published in the Open Access Journal of Cheminformatics . There is an interface on the above UniChem link, but for most use we anticipate REST web services access - details are on the link above. The link to the provisional pdf is here . One of the jolly blog pixies is writing a blog post showing some use cases for UniChem - and I have a lovely thing called "Chive" to tell you about in a few weeks! %T UniChem: a unified chemical structure cross-referencing and identifier tracking system %A J. Chambers %A M. Davies %A A. Gaulton %A A. Hersey %A S. Velankar %A R. Petryszak %A J. Hastings %A L. Bellis %A S. McGlinchey %A J.P. Overington %J Journal of Cheminformatics %D 2013 %V 5 %O doi:10.1186/1758-2946-5

New Drug Approvals 2012 - Pt. XXXIII - Apixaban (ELIQUIS®)

ATC code : B01AF02 Wikipedia : Apixaban On December 28, FDA approved Apixaban ( Trade Name: ELIQUIS®; ChEMBL :  CHEMBL231779 ;  KEGG :  D03213 ; ChemSpider :  8358471 ; DrugBank :  DB07828 ; PubChem :  CID 10182969 ) as an anticoagulant  for prevention of  venous thromboembolism  and related events, indicated to reduce the risk of stroke and systemic embolism  in patients with non-valvular atrial fibrillation.  Atrial fibrillation ( AF ) is most common cardiac arrhythmia (irregular heart beat). There are many classes of AF according to American College of Cardiology ( ACC ), American Heart Association ( AHA ) and the European Society of Cardiology ( ESC ) one of which is non-valvular AF - absence of rheumatic mitral valve disease, a prosthetic heart valve , or mitral valve repair  (AF which not caused by a heart valve problem). Usually AF increases the degree of stroke risk, can be up to seven times that of the average population. AF is one of the major cardiog

Privacy and the ChEMBL Database

Privacy is pretty important - for example, in the picture above I have protected to privacy of two colleagues, as I think I should ;) In fact I've even made sure that the black box securing their identities is not a layer on the image that can be trivially removed..... Chemistry is a little different to some other areas of life-science research, and there is a little more caution applied typically in the use of 'public' database systems by people working on chemical structures - primarily because of patenting and novelty. There are probably similar privacy/security concerns over sequence data too - and in ChEMBL we've covered that too. I'm not going to drift on to what constitutes a 'publication', and all that sort of stuff since 1) I'm not qualified, 2) I don't have the time (and 1) anyway), and 3) it attracts trolls (and 1) and 2) anyway). I have been asked for a talk through on the usage and query privacy of ChEMBL as part of the great Open

Paper: Fuelling Open-Source Drug Discovery: 177 Small-Molecule Leads against Tuberculosis

As it was  announced last year, s ome of our collaborators in GSK Tres Cantos just published the results of a large antimycobacterial phenotypic screening campaign against  Mycobacterium bovis  BCG with hit confirmation in  M. tuberculosis  H37Rv. After the screening and in silico cascade, a set of 177 potent non-cytotoxic H37Rv hits was identified, providing a plethora of diverse potential starting points for new synthetic lead-generation activities to the global scientific community. The dataset is hosted in ChEMBL and can be downloaded from  here  with a short description here . %T Fueling Open-Source Drug Discovery: 177 Small-Molecule Leads against Tuberculosis %A L. Ballell %A R.H. Bates %A R.J. Young %A D. Alvarez-Gomez %A E. Alvarez-Ruiz %A V. Barroso %A D. Blanco %A B. Crespo %A J. Escribano %A R. González %A S. Lozano %A S. Huss %A A. Santos-Villarejo %A J.J. Martín-Plaza %A A. Mendoza %A M.J. Rebollo-Lopez %A M. Remuiñan-Bla

New Drug Approvals 2012 - Pt. XXXI - Lomitapide (JuxtapidTM)

ATC Code: C10AX12 Wikipedia: Lomitapide On December 21 st , the FDA approved Lomitapide (Tradename: Juxtapid ; Research Codes: BMS-201038-04, BMS-201038, AEGR-733), a Microsomal triglyceride transfer protein (MTP) inhibitor, as a complement to a low-fat diet and other lipid-lowering treatments, in patients with homozygous familial hypercholesterolemia (HoFH). Familial hypercholesterolemia is a genetic disorder, characterised by high levels of cholesterol rich low-density lipoproteins (LDL-C) in the blood. This genetic condition is generally attributed to a faulty mutation in the LDL receptor (LDLR) gene, which mediates the endocytosis of LDL-C. Lomitapide, trough the inhibition of the microsomal triglyceride transfer protein in the liver, prevents the assembly of Apoliprotein B-containing lipoproteins , which is required for the formation of LDLs, thus contributing to lower the circulating LDL-C levels. The Microsomal triglyceride transfer protein, which res