Skip to main content

Posts

Showing posts from January, 2019

FPSim2, a simple Python3 molecular similarity tool

FPSim2 is a new tool for fast similarity search on big compound datasets (>100 million) being developed at ChEMBL. We started developing it as we needed a Python3 library able to run either in memory or out-of-core fast similarity searches on such dataset sizes. It's written in Python/Cython and features: A fast population count algorithm (builtin-popcnt-unrolled) from https://github.com/WojciechMula/sse-popcount using SIMD instructions. Bounds for sub-linear speed-ups from 10.1021/ci600358f A compressed file format with optimised read speed based in PyTables and BLOSC Use of multiple cores in a single search In memory and on disk search modes Simple and easy to use Source code is available on github and Conda packages are also available for either mac or linux. To install it type: conda install rdkit -c rdkit conda install fpsim2 -c efelix Try it with docker (much better performance than binder):     docker pull eloyfelix/fpsim2     docker run -p 9

2019 and ChEMBL – News, jobs and birthdays

  Happy New Year from the ChEMBL Group to all our users and collaborators.  Firstly, do you want a new challenge in 2019?  If so, we have a position for a bioinformatician in the ChEMBL Team  to  develop pipelines for identifying links between therapeutic targets, drugs and diseases.  You will be based in the ChEMBL team but also work in collaboration with the exciting Open Targets initiative.  More details can be found here   (closing date 24 th January).  In case you missed it, we published a paper at the end of last on the latest developments of the ChEMBL database “ ChEMBL: towards direct deposition of bioassay data”. You can read it here .  Highlights include bioactivity data from patents, human pharmacokinetic data from prescribing information, deposited data from neglected disease screening and data from the IMI funded K4DD project.  We have also added a lot of new annotations on the therapeutic targets and indications for clinical candidates and marketed

RDKit, C++ and Jupyter Notebook

Fancy playing with RDKit C++ API without needing to set up a C++ project and compile it? But wait... isn't C++ a compiled programming language? How this can be even possible? Thanks to Cling (CERN's C++ interpreter) and xeus-cling jupyter kernel is possible to use C++ as an intepreted language inside a jupyter notebook! We prepared a simple notebook showing few examples of RDKit functionalities and a docker image in case you want to run it. With the single requirement of docker being installed in your computer you'll be able to easily run the examples following the three steps below: docker pull eloyfelix/rdkit_jupyter_cling docker run -d -p 9999:9999 eloyfelix/rdkit_jupyter_cling open  http://localhost:9999/notebooks/rdkit_cling.ipynb  in a browser