Well, we've got to about 125 citations(1) for the main ChEMBL database paper so far, which for a year is a pretty good haul we think. Given this reasonably big number, we thought it would be appropriate to analyse where the use of ChEMBL makes it's way into the published literature - or what is our 'research user community'(2). A simple way to analyse this is to look at papers that cite ChEMBL, grouped by journal. The graph is below - it's a classic log-normal/power law style frequency-class distribution.
So J Chemical Information & Modelling (JCIM) is about 20% of all citations, and could indicate that the biggest early impact of ChEMBL is in the development of novel methods for compound design - which was one of our hopes for what our work and the ChEMBL data could achieve - better, safer drugs, quicker! Then there's the database community in Nucleic Acids Research (this is quite an unusual journal for comp chemists and modellers, but it is the de facto (and highest profile) place to publish "resource" papers in the life sciences, and it's a completely Open journal (3)) - so the data is being used and integrated elsewhere; then J Med Chem - the premier medicinal chemistry journal, and so on. It is also notable that ChEMBL has contributed centrally to two Nature full Articles this year (covered in earlier posts) - and given how infrequently chemistry makes the pages of the might Nature and Science this is great news for us, and probably good for the entire community with respect to profile and awareness of the field!!
It's interesting to see the strong trend to JCIM - this probably means that they have a receptive set of reviewers and know how to route stuff to the right people (of course if they then reject 95% of all ChEMBL citing papers that's not such good news).
So what next - it got us thinking about how we would expect ChEMBL to impact the field/literature long term - it's really really unlikely that papers that use methods and further integrated data and discover drugs will ever cite the ChEMBL NAR paper. But we will try and track the ripples that ChEMBL makes over time......
A few notes.
1) Citation data is from Google Scholar. But c'mon google - give us an API.
2) We know that many people who use ChEMBL are not really interested in publishing, that they are not free to publish their work, or that they don't have the time to publish, alongside all the other junk they have to deal with.
3) Open, Closed, Gold, Green, Good, Evil, Cow, Horse..... The ChEMBL NAR paper itself (the one that has the 125 citations analysed above) is Open Access, and the entire ChEMBL database team is solely funded by The Wellcome Trust (including my position), so we are under the obligations of their requirements for Open Access publishing. We cannot of course influence where researchers publish use of ChEMBL (and there are many publications that use the ChEMBL data that do not cite our papers :( ), but they will be under their own funders requirements - and remember that not all research is tax-payer (or similar) funded, so not all funders are as motivated to worry about Open Access, especially if it is yet an additional cost. So unfortunately, not all the papers that use ChEMBL are Open Access. But if you can, publish all your research and reviews Open Access - go on, it will make you smile and dogs in the street will like you!
Update - I've (jpo) done a bit of editing on this post overnight - I rushed it yesterday to catch a train, and thought that some additional context and comment was required.
jpo and francis