Modeling Scientific Workforce Diversity II

October 17, 2008


On October 3, 2007, the National Institute of General Medical Sciences convened a working group to assess the feasibility of computer modeling to guide policy makers in their efforts to increase the diversity of the scientific workforce. The group consisted of experts in workforce development; training, recruiting, and retaining minority students and faculty; sources of data; and computational modeling who made specific recommendations about next steps. These included:

  • A concerted modeling effort, focused on understanding the underlying dynamics that produce successful scientists, identifying questions in need of research, and stimulating the collection of data, would be valuable for program design and policy-making.
  • The complexity of modeling workforce dynamics could be minimized by breaking the problem into a series of subsets and developing a suite of models focused on specific areas.
  • In a modeling effort, the academic job market should not be viewed in isolation from the larger system of scientific workforce dynamics. Decisions about NIH training influence—and are influenced by—larger societal and economic dynamics.
  • Coarse-grained models can be designed to look at the whole pipeline focusing on long-term dynamics. Fine-grained models could be focused on career choices and risk.
  • A modeling effort should ultimately focus on providing policy and decision makers with additional tools to help them understand the consequences of their actions.
  • NIGMS should work with other groups and agencies to identify policy-relevant questions that are amenable to modeling given existing data. In addition, NIGMS should support more collection and analysis of data on current NIH training activities.

Finally, the working group recommended that NIGMS hold a follow-up meeting of experts in computational modeling to determine if such a modeling effort is feasible and, if so, to make specific recommendations about how a modeling project could be developed. This follow-up meeting was held on October 17, 2008

The Task

Under the direction of Elias Zerhouni, NIH has begun an intentional effort to ground policy in data and quantitative analyses. Policies and programs that affect the dynamics of the scientific workforce are particularly important to NIH's and NIGMS's missions. Jeremy Berg described how modeling could improve decisions in two areas—enhancing minority participation in science and supporting new investigators. Even simple models that capture the relationship of when programs or decisions are enacted and when outcomes could be measured would be enormously helpful.

Evaluating the success of training programs is a huge challenge. Juliana Blome reported that most training evaluation projects have been driven by available data, not by the needs of policy makers. It is generally not possible to attribute specific outcomes to specific interventions. Clif Poodry pointed out that we need to know where to invest – in individuals, in institutions, or in curricula, for example. Further, we need to identify and challenge assumptions because debunking myths may be a significant outcome of a modeling effort. He urged the modelers to take account of such questions as "What makes a field desirable?," "Are there differential effects of programs and policies on minorities and women?," and "Are there ways to amplify the effects of interventions?"

Stephen Eubank summarized the views of the working group when he challenged the metaphor of a pipeline as a description of scientific workforce dynamics. The fallacy, he explained, is in assuming that people are aiming at a goal. In fact, we know little about what drives peoples' decisions at various stages of the development. He compared historical expectations to a Laffer curve, which purports to show the relationship of taxes and revenue. Although we may understand the relationship at the beginning and end, we may not know anything about what happens in the middle. The dynamics of the workforce, like those of economic systems, are complex and almost certainly depend on a variety of external forces and feedback. If the system is a network, the pipeline metaphor may lead us to focus attention on interventions that have little impact.

It is necessary, but insufficient, that models explain historical data and dynamics. Therefore, a milestone should be developing models that are consistent with existing data.

Flexible modeling strategies could allow for research and analysis from a variety of perspectives. Ross Hammond noted that agent based modeling allows for heterogeneities in the roles of actors (e.g., students, teachers, faculty), timing of interventions, economic landscape, institutional missions, and policy choices. The modeling process should start with breaking down the system into manageable pieces, possibly by identifying key transition points, and building up carefully and deliberately.

Eric Jakobsson pointed out that we know little about the structure of networks that produce and retain scientists. As an example, he explained that protein interaction networks are large and diffuse, but interactions of protein domains are much more compact and interconnected. Just as we can build protein interaction models from protein domain networks, it may be possible to study individuals as collections of social modules such as schools, neighborhoods, mentors, and so on. The challenge is finding the appropriate functional units.

The role of industry may be more difficult to study, but is critical to understanding workforce dynamics. Jack Muckstadt spoke for the entire working group in urging that industry be included in models. He further urged that we include cost/risk/benefit analyses to examine the real costs and real outcomes of programs and policies.

Goals of Modeling

The modeling program envisioned by the working group is grounded in real data and focused on key policy and decision questions. Ultimately, models should provide policy and decision makers with additional information and analyses to contribute to informed decision making and to better, well-focused data collection and measurement of outcomes.

The goals of a modeling project should include the following:

  • Develop a layered set of models that address, as comprehensively as possible, the short and long term implications of specific policies and programs
  • In the short term, reproduce historical dynamics and trends as represented by current data, including aggregated life histories of subgroups of scientists
  • Reproduce the current (baseline) structure of workforce demographics, variation, and dynamics
  • Identify and test key assumptions that contribute to policy and program development
  • Identify external factors that have major effects on scientific workforce dynamics; economic incentives and disincentives are of particular interest
  • Identify factors that have differential effects in various subgroups such as young investigators, minorities, and women
  • Identify major effects on outcomes. For example, one could ask if transformative events are rooted in personal relationships (e.g., mentoring), institutions (e.g., student and faculty recruitment), or programs (e.g., predoctoral training programs, postdoctoral fellowships)
  • Take account of economic, cultural, and social influences, including diversity among subgroups and existing research on how people make career decisions
  • Consider the implications of how programs, decisions, and policies are implemented in a heterogeneous environment
  • Examine the implications of lifting interventions. How long should programs be kept in place?
  • Provide a framework for additional research on workforce development

Numerous reality checks are critical to the success of modeling projects and should be built into the infrastructure. For example, it is critical to involve advisors with a wide range of expertise and experience to guide the project. The working group pointed out that the MIDAS (Models of Infectious Disease Agent Study) organizational model has been successful in promoting constructive conversation about modeling questions and in developing a variety of modeling approaches suitable for different situations. They recommended that an NIGMS-supported program to model scientific workforce diversity should be organized, like MIDAS, as a collaborative network of individual groups.

An Unanswered Question

The purposes to which models are put will define the scope of a modeling program. For example, if one wants to anticipate the demand for scientists, one would need a model with broad scope and long term focus. If one wants to study means for implementation, a narrower, targeted model is called for. Defining a clear set of questions in collaboration with key communities is fundamental.

What is the appropriate scope? Should a project include the medical and educational workforces? Technical staff? What areas of science should be included (or excluded)? How should industry be involved?


The 2008 working group agreed that available modeling approaches are suitable to development of a set of models of scientific workforce dynamics. Model results should contribute to a variety of questions related to training, diversity, incentives, life history, and impacts of policies and programs. The working group recommended that models include economic, social, and cultural factors as well as research on decision making strategies. Currently available data will be used to set parameters and to validate modeling approaches. One expectation is that models will inform future data collection efforts. The working group also agreed that this effort should include a diversity of modeling approaches and research groups who will work in close collaboration with NIGMS and its advisors.