Today’s paper [PMC] is Hert et al., (2009) Quantifying biogenic bias in screening libraries. At issue for todays class is a discussion about one of the first steps in drug discovery, compound library selection and generation. The authors of this paper pose a very interesting question: with the available chemical space (which is massive) how do high throughput screening (HTS) efforts for drug discovery ever succeed?
Chemical space—that is, all possible molecules—is estimated to be greater than 10^60 molecules with 30 or fewer heavy atoms; 10 ug of each would exceed the mass of the observable universe. This figure decreases if criteria for synthetic accessibility and drug likeness are taken into account and increases steeply if up to 35 heavy atoms (about 500 Da) are allowed. Positing even a modest specificity of proteins for their ligand, the odds of a hit in a random selection of 10^6 molecules from this space seem negligible.
So, based on this seemingly impossible complexity, how does HTS ever succeed to begin with. They have at least two hypotheses:
HTS nevertheless does return active molecules for many targets; how does it overcome the odds stacked against it? One might hazard two hypotheses. First, molecules that are formally chemically different can be degenerate to a target, and many derivatives of a chemotype may have little effect on affinity. This behavior, and the polypharmacology of small molecules, undoubtedly contributes to screening hit rates. Such chemical degeneracy seems unlikely, however, to overcome the long odds against screening. A second explanation is that screening libraries are far from random selections, but rather are biased toward molecules likely to be recognized by biological targets. This second hypothesis seems more plausible, as many accessible molecules are likely to resemble or derive from metabolites and natural products. Some of these will have been synthesized to resemble such biogenic molecules, while others will have used biogenic molecules as a starting material.
Perhaps this is intuitive but this is science and we all know that intuition can only take you so far (if anywhere at all). To demonstrate their point they examine all possible molecules with 11 or fewer heavy atoms with first row elements (26 million compounds) and compare this and real, available screening libraries to natural product databases. They find a massive enrichment for natural product similarity in these real libraries compared to the calculated chemical space (the 26 million compounds). If the available libraries were random you would expect that the similarity score distribution would be the same between data sets. This is clearly not the case suggesting that a biogenic bias is present in current screening libraries. Moreover, as compound size increases (more complex molecules) the similarity to known natural products increases exponentially. This is very strong evidence of bias toward natural product chemistry in screening libraries.
Returning to our motivating question, a major reason why the screening of synthetic compounds ever finds notable hits is that our libraries are biased toward the sort of molecules that proteins have evolved to recognize. Thus, there are almost as many metabolites and natural products among the 25,810 purchasable GDB molecules as there are among the 26 million GDB molecules overall. This bias increases rapidly as molecules grow in size, and the bias among larger lead-like and drug-like molecules is expected to be many orders of magnitude more still than that measured for the very small molecules explored here, where full enumeration allowed us to compare to a complete chemical space.
Now, you may say that this is obvious. These libraries are products of human invention or assembly. This is no doubt true; however, this allowed the authors to figure out if there are major gaps in existing screening libraries or even among available chemicals that might be introduced into screening libraries:
we suggest that screening libraries may be improved by increasing the bias toward biogenic molecules further still, by adding to libraries molecules resembling biogenic scaffolds that are now absent from them. After all, the bias in our current libraries is largely unintentional, the product of what organic chemists have synthesized since the birth of the field with urea in 1828. This leaves room for intentional optimization. Indeed, 83% of the core ring scaffolds present among natural products are simply absent among commercially available molecules, and by extension screening libraries… Biasing future screening libraries to fill these systematic absences in our current collections will help address the new genomic targets with which we are increasingly confronted, and against which screening has had such mixed success.
Take home message: evolution (rightfully so) massively biases our HTS libraries toward biogenic molecules. However, this biasing can be further improved as there are thousands of naturally occurring ring structures that are simply not represented in screening libraries. There are likely good reasons for leaving some of these out (known tox, etc.) but, on the whole, concerted efforts toward covering the available chemical space of natural product chemistry in small molecule screening efforts could improve drug discovery.
Also see previous discuss at Derek Lowe’s place
Hert, J., Irwin, J., Laggner, C., Keiser, M., & Shoichet, B. (2009). Quantifying biogenic bias in screening libraries Nature Chemical Biology, 5 (7), 479-483 DOI: 10.1038/nchembio.180