7/10/2003 8:00 AM
7/11/2003 5:00 PM
The National Institute of General Medical Sciences organized the first workshop on data management for the Protein Structure Initiative (PSI). The workshop was held on July 10–11, 2003, at the National Institutes of Health, in Bethesda, Maryland. This meeting brought together experts in the fields of bioinformatics and biological data management, from the nine PSI pilot research centers, other structural genomics research laboratories, structural genomics companies, genome-sequencing centers, and the Protein Data Bank (PDB). The goal of the workshop was to promote collaboration and sharing of resources and knowledge and to explore the feasibility of centralizing and standardizing some of the data collected by the structural genomics centers. The meeting consisted of three major components:
It was apparent at the workshop that impressive progress had been made by the PSI centers during the first 2 years of operation. Presentations at the meeting highlighted elements of this progress. The data management effort evolved from simple data collection and storage to a comprehensive effort that includes information integration and mining, target selection and prioritization, experiment design and tracking, automated data collection and processing, and automated dissemination and report generation. At the PSI centers, data management is becoming a central part of strategic decision making, as well as day-to-day research operation. Many of the centers have been putting significant resources and personnel (up to one-third of the total budget) into the data management components. This investment recognizes the importance of data management for the overall success of the PSI centers. However, researchers from some centers argued that the input for data management should be kept at a more modest level to avoid compromising experimental input. The outcomes for these two approaches will become more clear in the next few years.
During the discussion sessions, meeting participants introduced a number of topics. Some informatics experts commented that it is difficult to obtain important details on experimental procedures and results from the experimentalists. Others offered solutions to these problems through extensive bar coding, remote access, and automated data entry with minimum human intervention.
The merit of centralization versus decentralization was also debated. Many participants noted that a centralized monolithic database for the entire PSI could provide efficiency and uniformity but would be less likely to meet all the needs of centers working on different systems using different approaches. A preferable alternative would be a federated architecture that allows local databases to be fully tailored to the needs of particular centers and enforces communication with the central repository to enable data mining across centers. This discussion led to consideration of standardization for data exchange and communication. The TargetDB at PDB was constructed initially to minimize overlap of effort from the PSI centers on similar targets. This resource has been very useful to the scientific community and has attracted contributions from many other structural genomics laboratories around the world. The workshop participants recommended expansion of the TargetDB as a central repository for the protein expression and crystallization data collected by the structural genomics laboratories, to allow access and creative mining by more scientists. A planning committee has been established to formulate the data definition, standards, and requirements for this central data repository following the meeting.
This data management workshop accomplished its goals through active participation of everyone who attended the meeting. It provided a good avenue for productive information exchange. As one participant told staff of the National Institute of General Medical Sciences, the workshop was “exciting and informative.”
Connect With Us: