ChEMBL Resources

Resources:
ChEMBL
|
SureChEMBL
|
UniChem
|
ECBD

Thursday, 24 January 2019

FPSim2, a simple Python3 molecular similarity tool




FPSim2 is a new tool for fast similarity search on big compound datasets (>100 million) being developed at ChEMBL. We started developing it as we needed a Python3 library able to run either in memory or out-of-core fast similarity searches on such dataset sizes.

It's fully written in Python/Cython and features:

Source code is available on github and Conda packages are also available for either mac or linux. To install it type:

conda install rdkit -c rdkit 
conda install fpsim2 -c efelix

Try it with docker (much better performance than binder):

  •     docker pull eloyfelix/fpsim2
  •     docker run -p 9999:9999 eloyfelix/fpsim2
  •     open http://localhost:9999/notebooks/demo.ipynb in a browser

Or if you prefer to try it without installing anything (yet)... Click on the binder image!

Binder
Data files used in the demos are also available to download.


I would also like to thank Andrew Dalke and Greg Landrum for their blogs, they have been very useful resources!

Eloy

4 comments:

George Papadatos said...

very cool! how does it compare to chemfp?

Eloy said...

thanks!

I can only compare it to chemfp 1.5 version, which is the opensource one.

FPSim2 is Python3 compatible, can use multiple threads in a single query and has a fast loading compressed file format.
SMILES, InChI and molfiles can be used as an input for a search, but this also comes with a cost.
FPSim2 can also run searches without loading all FPs in memory at once. This enables Raspberry Pi to run Unichem (>150 million) similarity searches :)

chemfp, as a more mature software has many more extra features like calculating full similarity matrices for example.

FPSim2 still needs some optimisations, features and a benchmark after some of this work is done.

Miquel Duran said...

Very nice tool. I am trying to move from chemfp to using FPSim2... Are you planning to include the calculation of similarity matrices? Thanks!

Eloy said...

Glad you find it useful! This is a feature that we considered to implement but it won't probably happen before next ChEMBL (26) release.