NIGMS Protein Structure Initiative (PSI)

Location:
National Institute of General Medical Sciences, NIH

Start Date: 3/7/2002 8:00 AM

End Date: 3/8/2002 4:00 PM

Disclaimer

This report was developed solely by the participants of the Protein Production Workshop, held March 7-8, 2002. The National Institute of General Medical Sciences accepts no responsibility for the opinions expressed by the report's authors, including those regarding any commercial products, processes, services, or manufacturers mentioned in the report.

Introduction

The Protein Structure Initiative (PSI) of the National Institute of General Medical Sciences (NIGMS) is part of an international effort in structural proteomics. The aim of this effort is to determine one or more representative structures from each of the several thousand protein families in nature. International research on structural proteomics is being driven by the availability of a rapidly expanding database of genomic sequences. The ultimate goals are to understand the evolutionary implications of protein structures and to elucidate their biochemical and biophysical functions in relation to their three-dimensional structures. Knowledge of the protein structures, and development of the corresponding reagents and technologies through the International Structural Proteomics Initiative, will provide the foundation for scientific and technological infrastructure that will broadly and importantly influence biomedical and biomolecular research well into the 21st century.

The production of protein samples that are suitable for X-ray diffraction and/or analysis in solution using nuclear magnetic resonance (NMR) is a major challenge for conducting high-throughput analysis of protein structures. Some experts have suggested that production of these protein samples is the only real barrier to the determination of protein structures using high-throughput methods. Researchers participating in the nascent PSI already are making important advances in the technology of protein production for structural biology. This technology will be valuable to the broad molecular bioscience community and is certain to be one of the important, definitive successes of the initiative.

The production of proteins for research on structural proteomics includes selection of target proteins; cloning and recombinant expression of them; processing (high-density fermentation, labeling with selenium and/or stable isotopes for NMR studies); purification and analytical characterization; and organization of results and reagents into databases and reagent libraries.

The NIGMS convened the Protein Production Workshop in March 2002 to specifically address high-throughput methods for cloning, production, and purification of proteins suitable for X-ray crystallography and NMR studies. The workshop brought together researchers from nine center grant (P50) and two program project grant (P01) programs, which have been funded by NIGMS as pilot projects in structural proteomics, and investigators from several international efforts. The primary goal was to nurture the development of the technology through sharing of ideas, problems, and progress by experts from the various projects. A particular aim was to encourage the participating groups to establish contacts and collaborations. Although other issues (e.g., the selection and crystallization of target proteins, the collection and analysis of structural data, and the integration of databases) are critical to the overall research effort, NIGMS chose not to focus on these topics at this meeting.

The workshop consisted of four sessions: invited speakers, two sessions of presentations from investigators at the NIGMS research centers in structural genomics, and general discussion. This report summarizes the opening remarks, the presentations and discussion, and final comments.

back to top

Opening Remarks - J. Norvell, NIGMS Program Director

In 1999, NIGMS initiated the PSI to develop a national program of research support in structural genomics. The initiative was based on recommendations from several workshops, and NIGMS subsequently funded nine research centers as pilot projects. Dr. Norvell summarized the overall goals for PSI. The initiative is a cooperative, large-scale effort to determine the structures of unique, non-redundant proteins by using high-throughput methods that will ultimately result in an inventory, and complete coverage, of naturally occurring proteins. The target proteins for PSI include proteins whose functions are known or unknown. Dr. Norvell pointed out that an important goal of PSI is to create a large research network and public resource. He emphasized that genome-directed selection of target proteins, registration of target proteins, and timely publication of coordinates and release of data are important. He discussed various options for rapid electronic publication of results. Dr. Norvell introduced several issues for discussion at the workshop. These included the development and application of new technology and new high-throughput methods and the sharing of protocols, materials, and samples.

back to top

Session I. Invited Speakers

Five international researchers described various aspects of the production of proteins.

Gateway Production Vectors and Maltose-Binding Protein Fusions - D. Waugh, National Cancer Institute, Frederick, Maryland

Dr. Waugh described the development of new Gateway production vectors based on maltose-binding protein (MBP) fusions. He noted that MBP fusion appears to be superior to all other fusion systems (e.g., GST, TRX) tested. MBP fusion offers efficient initiation of translation, better stability and protection against proteolysis, increased solubility of target proteins, and an affinity tag to help in purification. Interestingly, some proteins are in an inactive form after cleavage of the target protein from the MBP fusion. As a result, investigators must evaluate the "nativeness" of their protein structure (using circular dichroism and NMR data) after the protein is expressed. Dr. Waugh noted that after testing MBPs from several sources, investigators observed that the MBP from Pyrococcus furiosus was the most effective in solubilizing "passenger" proteins. He pointed out that expression of fusion proteins at lower temperature (25 degrees Celsius) enhanced solubility. Two of the best constructs evaluated to date are MBP fusion proteins with a TEV cleavage site and a hexaHis (His6) affinity tag next to the protein target at either the N- or C-terminus (MBP-attB-TEVsite-His6- passenger and MBP-attB-TEVsite-passenger-His6). These vectors, when used in conjunction with Gateway cloning technology, offer a powerful system for high-level and high-throughput production of proteins in Escherichia coli .

Large-Scale Production of Proteins for Structural Biology - D. Stuart, Oxford University, Oxford, United Kingdom

Dr. Stuart, who directs one of the largest structural genomics efforts in the United Kingdom, presented plans for a newly constructed facility for protein production. This effort is part of a larger consortium that includes synchrotron beamlines. In addition to X-ray crystallography and NMR, investigators will use cryo-electron microscopy for structural studies. For expression profiling, they will use microarray and Serial Analysis of Gene Expression (SAGE) technologies, as well as green fluorescent protein (GFP) fusions to track proteins. For purification of proteins, they will use expression vectors that have two affinity tags, including a three-residue tag with an epitope for a monoclonal antibody. Because of the diverse set of targets, the facility plans to express proteins in bacteria, insect, and mammalian cells. The goal is to process 1,000 clones a year. The resulting proteins will serve as a general resource for many projects. All processes, from bioinformatics to experimentation and data analysis, will be integrated using a laboratory information management system (NAUTILUS) and ORACLE database. The targets include proteins from herpes viruses, proteins involved in the function of immune cells, Zn fingers and transcription factors, protein-DNA complexes, and human proteins associated with cancer.

Protein Production and Isotope Enrichment in Cell-Free Systems - S. Yokoyama, RIKEN Institute, Yokohama, Japan

Dr. Yokoyama described the many important advantages of cell-free synthesis of proteins, which include easy manipulation and small volumes. In addition, linear DNA amplified by polymerase chain reaction (PCR) can be used as a template (nonspecific DNA is added to prevent degradation of the PCR templates), gene transcription can be coupled with protein translation in E. coli extracts, and the process can be automated easily. Expression of integral membrane proteins in the presence of detergents also is possible; however, the choice of detergent is critical. Dr. Yokoyama noted that Dr. Y. Endo at Ehime University, Japan, has developed a wheat germ cell-free protein expression system (in which translation inhibitors were eliminated) suitable for eukaryotic proteins. Using this system, investigators have had good success in expressing a number of small proteins from Arabidopsis thaliana . A workshop is being planned in Japan to disseminate this information to all interested researchers. Investigators have used the cell-free expression system for E. coli to produce protein domains that were predicted using homology-based methods and to label proteins with Cl-Tyr residues using suppressor tRNAs. In one example, investigators were able to advance from the production of proteins using Se-Met labeling in a cell-free expression system to the determination of protein structures within 2 weeks. Removal of the His tag and small adjustments in predicted locations of domain boundaries significantly improved the NMR spectra. Scientists at RIKEN Institute also are developing a mammalian system for expressing mammalian proteins.

Production and Purification of Proteins - C. Arrowsmith, Ontario Cancer Institute, Toronto, Canada

The Ontario Cancer Institute is associated with two of the NIGMS pilot centers for structural genomics in the United States (the Northeast Structural Genomics Consortium and the Midwest Center for Structural Genomics). Dr. Arrowsmith presented data on the large-scale production and purification of proteins. For determination of protein structure, the institute has cloned more than 2,000 genes and has expressed and purified more than 1,200 proteins from five organisms. Investigators have developed several standard protocols to improve their success rate and efficacy. The protocols include vectors with a hexa-His tag cleavable with thrombin or TEV protease and use of BL21-Gold I (DE3) strain and 2xM9 media with ZnCl 2 , thiamine, and biotin for protein expression. Up to 36 cultures are grown in parallel at 37 degrees Celsius, with protein induction at 15 degrees Celsius. For purification, the investigators use Ni-NTA resin in batch mode, which allows for purification of up to 18 proteins simultaneously. Dr. Arrowsmith noted that this rather "low-tech" approach has been highly successful and that investigators have determined the structures of more than 30 crystals and 14 NMR structures. She pointed out that, to cover the protein-fold space, investigators will have to use the genomes of multiple organisms when selecting target proteins. She also presented results which suggested that both NMR and X-ray crystallography are valuable for these efforts. For example, investigators at the institute used X-ray crystallography to crystallize 7 of 32 proteins that had poor HSQC spectra and used NMR methods to determine the three-dimensional structures of several proteins that could not be crystallized. Dr. Arrowsmith pointed out that the "mining" of large data sets may reveal important structural trends and improve investigators' selection of target proteins.

Protein Production in Baculovirus Systems - D. Freemont, Washington University, St. Louis, Missouri

Dr. Freemont, who is a member of the Midwest Center for Structural Genomics, described some of the advantages of using the baculovirus system for production of eukaryotic proteins. For example, when using this system, eukaryotic proteins are properly folded, disulfide bridges are correctly formed, prolines undergo correct isomerization, and many posttranslational modifications (including proteolytic processing and glycosylation of certain proteins) are handled properly. Some of the disadvantages of this system are that protein expression in baculovirus is not really a high-throughput approach, viral stocks for expression need to be maintained, protein expression levels are low and often require elaborate optimization, and baculovirus expression is very expensive. Because the system is based on protein secretion, extracellular proteins are obvious targets. Investigators have developed several strategies for protein expression, including the addition of various fusion tags, expression of proteins with signal sequences, and expression of heterodimeric proteins on a single vector using dual promoters. The researchers have made good progress in labeling proteins that are secreted with Se-Met for multiple anomalous diffraction (MAD) experiments, achieving incorporation of more than 98 percent. Despite this progress and the unquestionable value of these baculovirus methods of production, Dr. Freemont recommended that investigators first try to express eukaryotic proteins in E. coli and then, if this approach does not produce satisfactory results, consider using baculovirus vectors.

back to top

Session II. Presentations from NIH Structural Genomics Centers

Investigators from six of the NIGMS research centers in structural genomics reported on their research. Two presentations on the requirements for protein samples preceded these reports.

Requirements for Protein Samples for High-Throughput Crystallography and NMR Analysis - A. Jaochimiak, Midwest Center for Structural Genomics and Argonne National Laboratory, and T. Szyperski, State University of New York at Buffalo, New York

Dr. Joachimiak described the requirements for preparing proteins for X-ray crystallography, particularly when using anomalous diffraction of selenium-labeled proteins. For this approach, proteins should incorporate at least 1 ordered Se atom per 100-150 residues. For crystallization, folded, homogeneous protein samples at concentrations of 5-25 mg/ml are generally required. The constructs should have as few flexible polypeptide segments or affinity tags as possible, because flexibility can impede the crystallization process. Although the protein does not have to be absolutely pure for crystallization, choosing methods for concentrating and storing protein samples that minimize aggregation is critical. Dr. Szyperski described the requirements for preparing protein samples for NMR studies. While NMR studies of larger proteins are possible, for research in structural proteomics, NMR studies can only be done routinely on proteins that have molecular weights of less than about 25 kilodaltons. Samples are generally prepared with uniform isotope enrichment with 15 N, 13 C, and sometimes 2 H, which places special requirements on the expression and fermentation systems. Protein samples must be generated at 5-25 mg/ml concentrations at pH < 7.5 (and, ideally, at pH 6.5) and must exhibit good stability over several weeks in terms of chemical degradation, slow precipitation, and aggregation. The NMR samples for structure determination generally must be highly (> 97 percent) homogeneous.

Berkeley Structural Genomics Center - R. Kim, D. Busso, and J. Jancarik

Research at the Berkeley Structural Genomics Center is focused on proteins from "minimal organisms," including Mycoplasma genitalium (479 orfs) and Mycoplasma pneumoniae (677 orfs), as well as homologues of these proteins. The research group has explored several approaches to cloning: vector construction, including use of the Gateway technology, and topoisomerase-facilitated (Topo) cloning. The researchers report a high rate of mutations in PCR, particularly within the primer regions. To screen a large number of vectors for protein expression, they have explored different approaches to extraction of soluble protein from E. coli cells. The investigators encountered significant numbers of false positives when using chemical reagents (e.g., BPerII, Pierce) for cell lysis. For evaluating the solubility of proteins, use of a Misonix 96-well sonicator was more effective than other methods and the investigators generally obtained better results when not using salt. The researchers also compared in vivo and in vitro (cell-free) expression and solubility screening. This screening can be performed more quickly using the cell-free system and dot blots. The researchers confirmed that MBP fusion improves solubility of passenger proteins and that cleavage of the fusion can be improved by adding six glycines between the TEV protease site and the target protein. They determined that GFP, as a reporter of solubility, was too sensitive for correlating activity and solubility accurately. The researchers noted that the initial results on light scattering, from the data on crystallization screening, can yield critical clues about which additives to use when attempting to make a protein sample monodisperse.

Center for Eukaryotic Structural Genomics - B. Fox

Research at the Center for Eukaryotic Structural Genomics is focused on proteins from Arabidopsis . A major challenge for research on the structural proteomics of eukaryotes is access to suitable cDNA reagents for cloning. Dr. Fox noted that collaborators have provided approximately 51 cDNAs, but that these reagents were developed based on the unique interests of the collaborators, rather than on bioinformatic approaches. For this reason, the group has undertaken production of cDNAs by reverse-transcription PCR (RT-PCR), using mRNA from undifferentiated T87 cells. According to gene-chip analysis, up to 80 percent of the genes for Arabidopsis are expressed in these T87 lines. To date, investigators have cloned and amplified more than 50 percent of approximately 705 targeted genes. They have observed marked differences in the ability of commercial polymerases to amplify in this system and obtained the best results with ExTaq and Yieldbase (with 1-2 errors per kilobase). Currently, the researchers are constructing restriction endonuclease/ligase vectors using NdeI and BamHI sites, which are compatible with many Arabidopsis targets. The researchers have described their experience in using "fed-batch labeling," high-density fermentation, and cell-free expression on wheat germ-derived media. They have demonstrated 15 N enrichment using the wheat germ cell-free system. During the first 5 months of the project, the group generated 441 cDNAs using RT-PCR and produced 190 expression plasmids.

Joint Center for Structural Genomics - S. Lesley

Dr. Lesley described the progress made in high-throughput expression, purification, and crystallization of proteins from Thermatoga maritima (involving 1,877 genes). The technologies being developed have a throughput of 10,000 clones a year and utilize processes such as (1) the GNFermentor system, with parallel 40 ml fermentations of many samples; (2) optimization of parameters for fermentation; (3) optimization of automation for cell lysis; (4) robotic purification of proteins with hexaHis tags; and (5) secondary purification, as needed, using FPLC. The robotic lysis of cells includes a combination of treatment with lysozymes, freezing and thawing, and sonication. Dr. Lesley described the progress made in using the arabinose promoter, which provides much better control of basal protein production. With better control, investigators can improve synchronization of protein induction, which is important for parallel fermentation processes. Dr. Lesley noted that investigators have also established systems for producing proteins from Pichia and Bacculovirus genes and that these require large investments of time and resources. He also described the use of gene expression analysis in E. coli to monitor cell conditions characteristic of the overproduction of misfolded proteins. This analysis demonstrated that gene-chip expression profiles are different for cells that produce folded proteins compared with cells that produce misfolded proteins. He noted that, overall, the research group produces 96-192 proteins per week. Production is followed by robotic crystallization screens, which require 50 nanoliters of protein per drop. Many of the 172 crystals that investigators have sent to the Stanford synchrotron facility show good diffraction, even though almost all of the proteins have hexaHis affinity tags.

Midwest Center for Structural Genomics - A. Savchenko, F. Collart, I. Dementieva, and P. Laible

The investigators discussed the center's protein production pipeline, which includes different approaches to high-throughput cloning of genes, expression of proteins, and large-scale production of proteins. The research team has established standard protocols which are being implemented at the center's sites. The researchers have cloned more than 1,000 target proteins from Bacillus subtilus , E. coli , Haemophilus influenzae , Methanobacterium themoautotrophicum , and T. maritima . The strategy is multiplex and includes parallel manual and automated approaches for high-throughput generation of soluble expression clones. The researchers are developing an automated process that uses a ligation-independent cloning system in a 96-well format, and they are assessing the production level and solubility of the proteins produced. To date, they have moved more than 300 target proteins through the production pipeline. Expressed proteins are purified in parallel using a semi-automated method that implements IMAC and gel filtration. Purification tags are removed by cleavage with a TEV protease before the crystallization screens--an important step because affinity tags could be detrimental to the growth of crystals and the quality of their diffraction. Proteins that are produced using this approach are of high quality and yield good-quality crystals. In addition, the center's approach offers a potentially significant increase in the efficiency and speed of protein production. The pipeline for soluble proteins is being integrated with an expression system for membrane proteins that is based on Rhodobacter . The researchers have engineered this organism to enable them to coordinate the synthesis of foreign membrane proteins with that of a new membrane to provide a matrix for incorporating the newly synthesized target proteins. The researchers are currently optimizing this system using a set of 150 membrane targets from multiple organisms.

Northeast Structural Genomics Consortium - G. Montelione, M. Inouye, and L. Ma

The researchers described their progress in producing target proteins from families of eukaryotic proteins. They have targeted approximately 2,100 proteins from eukaryotic or prokaryotic "reagent genomes." Each protein is a representative from a large domain family ("Rost clusters") that includes at least one representative from the proteomes of several eukaryotic "target" organisms. The researchers are targeting multiple members of each family from one or several of the reagent genomes. For cloning, they are using a modified version of the pET production system (a "multiplex vector" system), which yields a set of different constructs from a single PCR product. The researchers are implementing the process in a 96-well format using a Qiabot 8000 robot. Investigators at the consortium's Rutgers and Toronto sites have together cloned approximately 800 of these targets and have screened them for expression and solubility. Approximately 40 percent of the proteins have good expression and solubility, and approximately 220 have been scaled up and purified (at a level of tens of milligrams) for crystallization screening and NMR studies. The researchers discussed the shortcomings of the Gateway vector system. Professor Inouye also described the progress made with a novel "cold shock vector," using the E. coli cold shock promoter, both to selectively express target proteins and to selectively enrich them with NMR-active isotopes. The researchers also described the progress made in using RT-PCR to generate large numbers of cDNAs for eukaryotic targets and in stabilizing linear DNA templates for screening target expression in cell-free production systems.

New York Structural Genomics Consortium - S. Burley

This consortium has recently merged its efforts with Structural GenomiX, Inc. (SGX), a structural genomics research company. Researchers at Rockefeller University are selecting the target proteins, and researchers at SGX will perform most of the production and crystallization of the selected proteins. This relationship brings significant commercial resources to the overall effort. For cloning, the investigators are focusing on the ligation-independent topoisomerase approach which uses affinity-tagged proteins and vectors being developed in Prof. C. Lima's lab at Cornell University. They have observed that tag cleavage is highly efficient when using "polioviral protease" instead of the traditional TEV protease cleavage site. ​The consortium also has developed a double-tagged production system that combines Smit 3 and His6 affinity tags, for rapid purification. The researchers have implemented these cloning technologies in a 96-well format using a Qiabot 8000 robot. They noted that the BL21(pLysIce) strain provides more efficient cell lysis and solubilization. By adopting a multigenomic approach (which utilizes multiple targets from each family), the researchers have obtained soluble proteins for approximately 80 percent of the targets. They also have explored, with mixed success, the mutagenesis and intragenic shuffling of GFP fusion proteins, for improving solubility. Dr. Burley pointed out that development of a LIMS system is very resource intensive and perhaps best done by a commercial entity. With an integrated storage system for images generated in robotic screening, which can archive approximately 1 million images, researchers could evaluate as many as 100,000 crystallization trials per day. The researchers have constructed a dedicated beam line, SGX-CAT, at Advanced Photon Light Source at Argonne National Laboratory.

back to top

Session III. Presentations from NIH Structural Genomics Centers

Six additional presentations were made by investigators from three NIGMS Research Centers in Structural Genomics, two related Program Projects, and by the planning group for a NIGMS Protein Structure Initiative Materials Repository.

Southeast Collaboratory for Structural Genomics - M. Adams, M. Luo, and H. Dailey

This center is involved in the production of proteins from P. furiosus , Caenorhabditis elegans , and human genes. The group has cloned 1,465 orfs, expressed 242 proteins, purified 200 proteins, and obtained 24 crystals from the proteins of P. furiosus . The researchers used ICP-MS to determine the metal ion contents of the protein samples. For the cloning of C. elegans genes, they selected the Gateway system and used an ELISA-based assay to determine, on a small scale, the expression and solubility level of the target proteins. The group has cloned 1,130 genes, expressed 369 proteins, purified 32 proteins, and crystallized 5 C. elegans proteins. For the cloning of human proteins, the group has selected the pTrcHis vector.

Structure 2 Function Pilot Project at CARB/TIGR -- O. Herzberg

Dr. Herzberg presented data on the performance of various vectors and the influence of tags on the efficiency of crystallization and the quality of crystals. The investigators cloned genes from H. influenzae in several different expression systems, which included an Intein system, use of T7 promoters, the Gateway system, and directional topoisomerase system. They found that native proteins expressed without a hexaHis tag were more likely to form diffraction-quality crystals (12 structures/29 native proteins) than were proteins that were produced with hexaHis tags which were then removed by proteolysis (8 structures/41 proteins). The overall success rate for obtaining a structure was almost twice as high when using native proteins produced without hexaHis tags (41 percent) than when using proteins with the tags removed (25 percent).

Structural Genomics of Integral Membrane Proteins - R. Nakamoto

This project is focused on development of technology for structural studies of integral membrane proteins. The research group is constructing a library of membrane proteins expressed from M. tuberculosis (Mtb). The researchers are using two-dimensional X-ray crystallography and electron microscopy, solid- and solution-state NMR, and three-dimensional X-ray crystallography. They have identified approximately 1,160 potential membrane proteins of Mtb, which they have divided into target groups based on the predicted number of transmembrane alpha-helices. The researchers will use the Gateway system for expression of the proteins. Because expression should be slow for membrane proteins, they will use minimal media, a lower copy number of plasmids, and lower temperatures than used for the expression of other proteins. One major bottleneck in this sample production pipeline for membrane proteins is the need to isolate membrane fractions by high-speed ultracentrifugation.

Structural Genomics of Pathogenic Protozoa Consortium - M. Dumont and C. Mehlin

This consortium has two centers for protein expression, one in Seattle, Washington, and a second in Rochester, New York. The research groups at both centers will work on soluble protozoan proteins, and each will address a set of uniquely challenging targets. The Seattle group will produce proteins from P. falciparum , an organism that has an extremely AT-rich genome and ill-defined "start and stop" points for the gene. The Rochester group will focus on production of membrane proteins. The Seattle group tentatively plans to use a directional topoisomerase cloning approach. Because conventional Topo vectors (e.g., Invitrogen) place large, hydrophobic amino acids at the N-terminus of expressed proteins, the Seattle group has outlined a custom vector which avoids this problem by inverting the direction of the insert into the vector. The Rochester group will explore ligation-independent cloning technology, because of its speed, low background, directionality, and low cost. These researchers also have extensive experience and interest in using Pichia pastoris as a host system for expressing membrane proteins. This system allows for rapid cloning and posttranslational modification, is inexpensive, and has been used for expression of membrane proteins. Both centers plan to use a hexaHis affinity tag system.

TB Structural Genomics Consortium - M. Park and C. Kim

The researchers described a platform for large-scale production of proteins from M. tuberculosis on a 96-format cloning strategy and use of simple, modular liquid-handling systems (e.g., Hydra 96), a vacuum manifold system, and a plate sonicator for 96-well microtiter plates. They reported development of a GFP-superfolder vector for the screening of clones for positive expression and use of "Terrific Broth" fermentation to obtain high densities of bacterial cells. They also used GFP as a reporter for cell disruption. The researchers also described use of a perfusion chromatography platform for rapid screening of conditions for ion-exchange chromatography and for development of a low-to-medium-cost chromatography system with parallel gel filtration.

NIGMS Protein Structure Initiative Materials Repository - C. Lewis

Dr. Lewis presented a plan to develop an NIGMS Materials Repository that will store and distribute materials generated by the PSI centers. NIGMS has proposed a tentative schedule to award a contract for this effort by August 2003. Dr. Lewis described the NIGMS Human Genetic Cell Repository as a model for the Materials Repository and requested advice on the organization of the effort. NIGMS specifically seeks input on the following topics: definition of scope and need, materials to be stored, quality control measures, criteria for an efficient data tracking and storage system, and potential contractors.

back to top

Session IV. General Discussion

During an informal discussion, the participants proposed the following actions:

  1. Creation of a Web-based bulletin board or chat box devoted to research on protein production, to enable scientists to share information.
  2. Publication of the collected papers from the present meeting as an issue of the Journal of Structural and Functional Genomics , to be recommended to the editors of the journal.
  3. Adoption of the same workshop format for next year's meeting, and consideration of the possibility of inviting biotechnology vendors to participate in the meeting.

Comments from individual scientists addressed several topics, as summarized below.

Evidence-based data (vs. anecdotal statements). Investigators are generating a large data set which should be mined to extract significant trends or procedures. A controlled vocabulary needs to be developed for this purpose.

Cloning. The Gateway system leaves several extra amino acids on a protein. The Intein system has not produced good results so far. Expression of baculovirus can be good, but the lines are not stable. While restriction-endonuclease -ligation systems are the most productive used to date, several groups are having good experience with topoisomerase-based methods. The Drosophila system produces stable lines, but the yields are low.

cDNA libraries. Researchers have observed that cDNA libraries undergo significant degradation upon propagation and that the original cDNA library is the most reliable. The NIH could facilitate research efforts by helping researchers access human tissues for generation of cDNA libraries and cDNA clones.

Protein solubility. Investigators have observed that the growing of cells at 15, 18, or 30 degrees Celsius slows down the production of proteins and often increases the solubility of proteins. The T7 promoter is very strong, and rapid production of proteins at higher incubation temperatures may lead to protein aggregation. Investigators have reported some success with insoluble proteins by completely denaturing the proteins in 6 M urea and 20 mM dithiotheitol, followed by dialysis in the presence of oxidized or reduced glutathione, arginine, and NaCl. When possible, determination of whether a target protein requires cofactors or metals would be very important. Trace metals can be added to the media and may enhance the solubility of the protein.

DNA sequencing. Researchers at the different centers have reported different error rates. One center sequences the DNA of genes before sending the purified proteins to another facility for crystallization. Another center only sequences the DNA of proteins that have been crystallized.

High-density fermentation. Fermentors are of great benefit. Investigators have determined that lactose can be used as an inducer with BL21 cells (glucose, in contrast, causes suppression).

Storage of proteins. Investigators store proteins in two separate freezers, one for protein samples containing glycerol and one for those without glycerol. An investigator suggested that proteins that are being shipped to another site be attached to the IMAC affinity resin before shipping in order to better stabilize them.

Recordkeeping. Development and use of a controlled vocabulary and common LIMS among centers would be advantageous.

back to top

Final Comments

The NIGMS Protein Production Workshop was one of the first PSI meetings to address a specific issue that has been identified as a significant impediment to implementation of the program's goals. Determination of the structures of proteins that are targeted for research in structural genomics will succeed only if a continuous supply of protein samples with high purity and in milligram quantities can be assured. Although the requirements for X-ray crystallography and NMR differ, both methods depend on having protein samples of high quality. Moreover, the production of proteins has great value for research in general, beyond the specific goals of research in structural genomics, and will help meet the needs of research in biochemistry, functional proteomics, biology, and biotechnology.

The participants at the workshop accomplished the aims set for the meeting. In particular, they (a) focused on issues associated with the expression and production of proteins; (b) highlighted state-of-the-art production of proteins using large-scale, high-throughput approaches; (c) examined different approaches adopted in the pilot projects and reported on their successes and failures; (d) showed new directions that should be pursued to successfully implement the goals of the pilot projects; (e) created new relationships with fellow researchers who are working on protein production at the pilot centers; and (f) engaged in open and generous interactions with researchers from all the pilot centers, which included the sharing of data and exchange of information on methods and procedures. These accomplishments are key aspects in accelerating the progress that is being made at the NIGMS pilot centers in structural genomics.

The participants emphasized that the format of the workshop provided an excellent platform for open discussion and exchange of information. Most of the participants agreed that the format was excellent. A number of participants suggested that NIGMS increase the size of future meetings, to accommodate the considerable public interest in this research. For future meetings, the participants encouraged NIGMS to seriously consider expanding the format of the workshops and to invite participants from related technology groups in industry. The participants strongly suggested that NIGMS schedule the next Protein Production Workshop for early spring 2003.

One of the main purposes in establishing the pilot centers was to undertake and test different approaches to the determination of protein structures. Investigators at the NIGMS research centers in structural genomics are pursuing diverse approaches to the cloning of genes and the expression and production of proteins. Some approaches appear to be more successful than others. The presentation of data, the exchange of ideas, and the discussion of future directions--on a regular basis and among investigators at all centers--are important for continuing to make progress and to avoid overlaps in research.