NIGMS Workshop on High-Accuracy Comparative Modeling
Natcher Conference CenterNational Institutes of HealthBethesda, MD
Co-Chairs: John Moult and Richard Friesner
Time: October 20-21, 2003
Place: Natcher Conference Center, National Institutes of Health, Bethesda, MD
Purpose of the workshop: The purpose of this workshop is to invite experts in the field of protein structure modeling and related fields to identify the road blocks and share their visions on improving the quality of comparative models. Structural genomics is a globally fast-growing area following recent success in genome sequencing and technical breakthrough in X-ray crystallography and nuclear magnetic resonance (NMR). The U.S. effort on structural genomics is mostly supported by the NIGMS Protein Structure Initiative. High-accuracy experimentally determined 3-D structures of representatives of large protein families will be generated at an ever increasing pace. Computational structure prediction will be made using the experimental structures as templates for the other protein family members. Current technology on homology modeling can produce structural models with high success rate for backbones and intermediate to low accuracy for side chains and loops.
Participation: Experts in the fields of protein structure modeling and other biological simulation/modeling will gather to share their visionary views of how to improve the accuracy of protein structure prediction using comparative modeling approaches. A limited number of seats will be reserved for observers on a first-come-first-served basis.
   Report: A long term major goal of the PSI is to produce useful models of all biological proteins, based on a representative set of experimentally determined structures. This goal recognizes that while 
   ab initio modeling of protein structure remains difficult, models based on homology to proteins with known structure are always possible, and the shorter the phylogenic distance between the two proteins, the higher the accuracy of the models. Modeling of this type is often termed ‘comparative’ or ‘homology’ modeling. Improvements in these modeling methods will greatly enhance the utility of PSI experimentally determined structures. In particular, obtaining models comparable in accuracy with experiment is a key objective. The goal of the workshop was to examine the current state of art in producing comparative models, to identify bottlenecks to obtaining higher accuracy, and to identify ways of moving the field forward as rapidly as possible. 
   
   
The first session addressed the interplay between experimental and computational methods, with talks by Rost (Tasks for Comparative Modeling in Structural Genomics), Kim (Experimental Structures: Accuracy, Reliability, and Uses), and Moult (Strengths and Bottlenecks in Contemporary Comparative Modeling). This was followed by a session with talks from seven experts in the field (Baker, Levitt, Friesner, Honig, Sali, Dunbrack, van Gunsteren), each presenting their views on the current state of the art, bottlenecks to progress, and how these may be overcome. The third session focused on applications of comparative modeling, particularly docking and drug design (Shiochet and Peishoff), the deduction of function from structure (Skolnick), and dissemination of models (Brenner). The second day of the workshop began with a short session on modeling of membrane proteins, a currently less advanced but critical area, with talks by Weinstein and Krystek. Bob Germain then described work at IBM, primarily focused at developing very fast machines for molecular dynamics simulations. 
   
   
Discussion of the talks began with ‘new ideas’ contributions from Ponder, Jacobson, Berger and Murray, highlighting central points, and adding new insight. There followed a structured discussion, aimed at determining where there is consensus in the field, and in areas of disagreement, what the range of opinion is. The main conclusions are as follows: 
   
   
A first clear consensus is that comparative modeling methods are already extremely powerful. In addition to playing a central role in the PSI, the technique is of great importance in biology in general. At present, experimental structures are known for less than 1% of identified proteins, whereas relatively high quality models can be produced for 10 to 20% of proteins, and some form of model is possible for up to 60% of the proteins coded for in particular genomes. Thus the vast majority of experimentalists work with models of proteins of interest, rather than experimental structures. As more and more genomes are sequenced, reliance on models will continue to increase. In addition, models play an important part in a number of methods for obtaining structure. Two new applications described at the workshop served to illustrate this. Montelione pointed out that it is now possible to very rapidly collect substantial amounts of NMR data for proteins, and that these data could be most effectively used as restraints in generating structure models. Godzik briefly described results from a PSI center, using a set of models of proteins to significantly decrease the level of sequence similarity needed for successful use of molecular replacement methods in obtaining phases for X-ray crystal structure determination. 
   
   
At the same time, it was also agreed that there is a clear need for greatly improved modeling performance, if full use is to be made of the experimental results generated by the PSI. A set of areas, activities and facilities where increased focus will result in rapid progress was identified, as follows:
- Methods to improve the refinement of comparative models. At present, maximum model accuracy is limited by the similarity of the best available template to the true structure. A key goal is to devise methods to reduce the overall atomic root mean square deviation between a model and the corresponding experimental structure to levels approaching experimental error. Some participants felt strongly that progress in other objectives listed below, particularly alignment and side chain accuracy, will depend on the emergence of successful refinement techniques.
 
- Methods to improve the alignment of a target protein sequence to those of available structural templates. While there has been substantial progress in this area, alignments based on sequence identity below about 30% are suboptimal. It is expected that emerging methods will likely make extensive use of structural information. A number of participants reported encouraging developments along these lines, but there was a range of opinion as to how effective these will be, and further emphasis is needed.
 
- Improved scoring functions for use in model assessment and refinement. Present scoring functions are unable to reliably select the most accurate models from a large set of candidates, and so progress here is critical. Methods may be based on improved physical descriptions such as new functional forms for representing atomic interactions, polarization effects, and the role of solvation; or on improved integration of all forms of information that might contribute to determining features of a structure, for example information from multiple available templates, regions of sequence conservation, and knowledge of functional elements. There is a range of opinion in this regard, and so parallel efforts are appropriate.
 
- Methods of estimating uncertainty, errors and flexibility levels in models, both globally and locally. Several participants emphasized that in order to gain wide acceptance, the modeling community must develop methods for assessing these factors, and ensure that these data are routinely included in released structures.
 
- Methods of assessing the compatibility of ligand molecules with a protein structure, either experimental or modeled. Encouraging results on docking ligands to experimental and model structures were reported at the meeting, but it is clear that substantial progress is needed if these methods are to play the expected role in interpreting the specificity of newly determined structures and models.
 
- Methods of assessing the usefulness of a model. While improvement in specific modeling steps resulting in reduced overall deviation from experiment is crucial, the usefulness of a model is a more subtl??e property, and may be determined by very specific features, such as the positioning of side chains contacting a ligand.
 
- Accepted benchmarks and standards to assess progress in specific modeling steps, such as alignment, side chain building, active site accuracy, and refinement. A number of participants felt that the availability of such measures will significantly speed progress in the field.
 
- Mechanisms of improving co-operation and interaction within the field, including establishing the benchmarks and standards within particular modeling areas, and the development of integrated software platforms. There is broad agreement that relatively simple steps to improve co-operation in these ways will pay off substantially. Chairs of three working groups were selected, and charged with setting up the necessary infrastructure.
 
- Improved access to computing facilities. Many of the developments discussed are very compute intensive, and so the availability of large computing clusters is seen as critical to progress. Several contributors emphasized that most modeling tasks can be carried out very well on loosely coupled clusters, and that the large investment being made in tightly coupled clusters is not beneficial or cost effective in this field.
 
- Collaboration with experimentalists. Improvement in model quality will be driven by practical applications. Thus, collaboration with experimental groups, leading to more accurate and/or efficient approaches to structure determination and to more powerful analysis of function, is very desirable.
 
- Collaboration on the introduction of new approaches. It is expected that needed methodological advances can partly be brought about by increased effort in existing programs, but there is also a need for the introduction of new concepts, algorithms and technologies. To this end, interaction between established researchers in the field and those in other areas, particularly mathematics, statistics and computer science, are desirable. These more exploratory approaches are inherently high risk, but may lead to the most dramatic progress.
For more information about the workshop, contact Jerry Li, Ph.D., at (301) 594-0828 or e-mail lij@nigms.nih.gov.