Report of the NIGMS Workshop on Challenges in Docking and Virtual Screening

August 26, 2005

The genesis of the meeting was the sense on the part of computational chemists in the pharmaceutical industry that the field of structure-based ligand discovery had reached a plateau. That is, excellent chemical intuition from the scientist behind the computer screen remains essential for the application of the methods of structure-based drug design. Recent publications comparing the capabilities of numerous docking and scoring algorithms support this position. The key question, then, is what should funding agencies do to push the field to the next level. A related question is what can the users in industry do to facilitate this next step.

The meeting began by examining the premise that the major challenge in structure-based drug design is scoring, not docking. The accuracy of predicting the correct pose of an active compound can be quite good. Virtual screening of a library leading to enrichment can be done reasonably well, although there is no a priori way to know which docking algorithm will work best on the target of interest. The real goal is not only to rank order ligands by binding but to predict actual affinities well enough that the number of compounds that need be evaluated experimentally can be significantly reduced.

Since one is always sacrificing accuracy for computational speed, especially in screening large libraries, we might anticipate that faster computers solve the problem by allowing us to routinely use more accurate computational methods. The consensus of the group, however, was that unless we can get the other things right, faster computers will only get us part way. In fact, given the inherent problems with structure-based drug design, it is remarkable that it works as well as it does.

So why are we not making more rapid progress? It is clear that we are not getting the physical chemistry right. Two examples:

  1. It is often observed that as the size of a ligand increases, the predicted affinity to a target continues to increase. In contrast to prediction, the affinity of a ligand series often reaches a plateau and does not increase further with size.
  2. Significant changes in binding energies can arise from small changes in the structure of the target, for example the development of drug resistance in AIDS. In fact, there is no way to estimate the reliability of the results from structure-based drug design.

To improve this situation, there are a number of problems that must be solved. It is hard to deal with water molecules—that is, to determine which are intimately involved in the binding of the ligand to the target. Dealing with other basic physical chemical properties, such as pKa, salt effects, and conformational states is also difficult. Entropies and enthalpies are notoriously hard to calculate. There is also the problem of accounting for protein reorganization as a result of ligand docking, as in the case of HIV reverse transcriptase.

Thus in looking to the future, one has to consider improvements in both empirical and physics-based models. What are the known gaps in our understanding of the physics of the process? Could there be as yet unanticipated gaps? Would better sampling help? Can searches be done more efficiently? How much can we gain from clever engineering and optimization? And how far will incremental improvements take us? Do we need fundamental new discoveries?

One issue that can be addressed is how to evaluate our methods, to know if they are improving. One mechanism to explore the state of ligand docking could be a contest similar in nature to the "protein folding contest" or CASP. The idea of a ligand-docking contest did not receive overwhelming support at this time, as the need for providing additional systematic data was considered more pressing. The group elected to defer discussion of contests to a later date, but did not rule out the concept.

Clearly good datasets containing measured affinities of ligands for targets are necessary. Some datasets, such as those for HIV protease are already available. However, more sets which represent a range of protein structures are needed; agreement on common benchmarks in the field would also be desirable. For a given target, ligands with a wide range of affinities, including positives and negatives ("decoys") are also needed. Much of this data resides in industry. The release of those data for which adequate intellectual property protection has been obtained or which represent "abandoned" projects would be extremely valuable for this whole field. Corollary datasets such as crystal structures generated for a series of bound ligands are also important. Again many of these valuable data sets reside in industry. There are obstacles for the release of these data. One is the lack of incentives. The second is cost. For example, crystal structures may need to be further refined or formatted for submission to the Protein Data Bank, PDB.

The generation of other "standard" data, for example solubility, for a well-defined set of compounds is also important to "train" the algorithms. Given its mission, the National Institute of Standards and Technology (NIST) might prove to be a valuable player. Finally, there also needs to be the capability to create some "living" data systems, for which new compounds, structures and experimental data can be generated when needed to more rigorously test improved docking and screening methods.

If data are released, how can they be made available to the scientific community? Chris Austin presented the PubChem website which is collecting and making available the kinds of data envisaged here. This database would seem to be an ideal repository for these data. Discussions should continue.

Next steps

Attendees agreed that an important step in facilitating the improvement of structure-based drug design methods is to make additional data available to researchers. Since release of such data from the pharmaceutical industry requires that we design a process that holds substantial promise , it was agreed to hold another meeting in fairly short order, in three months. Included, in addition to scientists from industry and academia, should be representatives from the Foundation for the NIH, the NIH molecular libraries roadmap, and the PubChem database. There was some sympathy for holding the meeting at Asilomar. Drs. Peishoff and Shoichet agreed to chair the meeting.

Workshop Participants

Christopher P. Austin, M.D.
Senior Advisor to the Director for Translational Research
Director, NIH Chemical Genomics Center
National Human Genome Research Institute
National Institutes of Health
Building 31, Room 4B09
31 Center Drive
Bethesda, MD 20892
Tel: 301-594-6238
Fax: 301-402-0837

Jeremy M. Berg, Ph.D.
National Institute of General Medical Sciences
National Institute of Health
45 Center Drive MSC 6200
Bethesda, MD 20892-6200
Tel: 301-594-2172
Fax: 301-402-0156

Jeff Blaney, Ph.D.
Vice president, Lead Discovery
Structural Genomix
10505 Roselle Street
San Diego, CA 92121
Tel: 858-228-1495
Fax: 858-558-0642

James Cassatt, Ph.D.
Cell Biology & Biophysics Division
National Institute of General Medical Sciences
National Institutes of Health
45 Center Drive MSC 6200
Bethesda, MD 20892-6200
Tel: 301-594-0828
Fax: 301-480-2004

Wendy Cornell
Molecular Systems
Basic Chemistry
Merck Research Laboratories
126 East Lincoln Avenue
Rahway, NJ 07065
Tel: 732-594-4954

Ernesto Freire, Ph.D.
Henry Walters Professor
Biology and Biophysics
Johns Hopkins University
3400 N. Charles Street
114 Mudd Hall
Baltimore, MD 21218
Tel: 410-516-7743
Fax: 410-516-6469

Michael K. Gilson, M.D., Ph.D.
Professor and CARB Fellow
Center for Advanced Research in Biotechnology
University of Maryland and Biotechnology Institute
9600 Gudeisky Drive
Tel: 240-314-6217
Fax: 240-314-6255

Barry Honig, Ph.D.
Principal Investigator
Department of Biochemistry
Columbia University
Box 36, BB2-221
630 West 168th Street
New York, NY 10032
Tel: 212-305-8283
Fax: 212-305-6926

William L. Jorgensen, Ph.D.
Department of Chemistry
Yale University
New Haven, CT 06520-8107
Tel: 203-432-6278
Fax: 203-432-6299

Leslie A. Kuhn, Ph.D.
Professor, Biochemistry & Molecular Biology, Computer Sciences & Engineering, and Physics & Astronomy
Co-Director, Quantitative Biology & Modeling Initiative
Michigan State University
502C Biochemistry Building
East Lansing, MI 48824-1319
Tel: 517-353-8745
Fax: 517-353-9334

Deborah A. Loughney
Director, Computer-Assisted Drug Design
Bristol-Myers Squibb Company
P.O. Box 4000
Princeton, N.J. 08543-4000
Tel: 609-252-6054
Fax: 609-252-6012

Barbara Mittleman, M.D.
Chief, Scientific Interchange Section
Office of Science Technology
National Institute of Arthritis and Musculoskeletal and Skin Diseases
National Institutes of Health
Building 10, Room 9N118A
Bethesda, MD 20892
Tel: 301-402-7696
Fax: 301-402-0765

Manuel A. Navia, Ph.D.
Drug Development Strategic Advisor
Oxford Biosciences Partners
222 Berkley Street, Suite 1650
Boston, MA 02116
Tel: 617-357-7474
Fax: 781-389-0686

Arthur J. Olson, Ph.D.
Department of Molecular Biology
The Scripps Research Institute
La Jolla, CA 92037
Tel: 858-784-9702
Fax: 858-784-2860

Catherine E. Peishoff, Ph.D.
Site Director, Computational Analytical & Structural Sciences
1250 S. Collegeville Road
UP12-210, PO Box 5089
Collegeville, PA 19426
Tel: 610-917-6585
Fax: 610-917-7393

Brian Shoichet, Ph.D.
Department of Pharmaceutical Chemistry
University of California San Francisco
1700 4th Street, QB3 Building
Room 508D
San Francisco, CA 94143-2550
Tel: 415- 514-4126
Fax: 415- 502-1411

Janna P. Wehrle, Ph.D.
Program Director
Cell Biology & Biophysics Division
National Institute of General Medical Sciences
National Institutes of Health
45 Center Drive MSC 6200
Bethesda, MD 20892-6200
Tel: 301-594-5950
Fax: 301-480-2004