MedChem Attractiveness and Redundancy -
Looking for value in Compounds and Chemical Space
Jen Loesel, Pfizer
Jen Loesel, Pfizer
A more diverse screening file is a better screening file. A bigger screening file is a better screening file. Are these statements really true? We will critically scrutinize both these questions in the talk.
In part 1 we will investigate the quality of chemical structures. A good screening file needs to balance quality versus diversity.
We generated an algorithm that is purely based on structure to achieve this. The algorithm is able to compete with medicinal chemists in ranking the attractiveness of compounds as defined by the consensus opinion of multiple chemists. We called the score MedChem Attractiveness (MCA ). The score is an important step towards quantifying the quality of chemical structures. The score complements existing algorithms for novelty and diversity as well as filters like the Ro5.
In part 2 of the talk we look at the size and economy of the screening file. The value of the whole screening file isn’t simply the sum of all its individual compounds. There is a limit at which a screening file becomes too big and costly for the aim it tries to solve – finding new leads for novel MedChem projects in an efficient manner?
Primary screens at Pfizer often yield large numbers of very similar hit compounds. These large clusters of active compounds represent limited value for Hit Identification beyond the first few active members. To streamline our screening operation we analysed the probability of finding actives in recent HTS screens based on fingerprint similarity. We combined the results from the HTS analysis with Belief Theory. This allowed us to define the ideal density of neigbours in chemical space for lead identification. Based on that density we defined a new property of the chemical space we call Redundancy. Redundancy represents the fraction of compounds populating chemical space beyond the ideal density for efficient Hit Identification screening.
This work was no academic exercise. The model resulted in the permanent deletion of >1 million compounds from the screening file. The result is a higher quality and more efficient Pfizer screening file for the future. Both algorithms are very generic and can be applied or adapted to a variety of other uses.